LONG ARITHMETIC PROGRESSIONS IN SUMSETS: 
THRESHOLDS AND BOUNDS 

E. SZEMEREDI AND V. VU 

Abstract. For a set A of integers, the sumset lA = A + ■ ■ ■ + A consists of 
those numbers which can be represented as a sum of I elements of A 

I A = {ai + . . . ai\ai G A^}. 
A closely related and equally interesting notion is that of I* A, which is the 
collection of numbers which can be represented as a sum of I different elements 
of A 

l*A = {ai + . . .ai\ai e Ai,ai =i aj}. 
The goal of this paper is to investigate the structure of lA and I* A, where 
yl is a subset of {1,2, . . . ,n}. As applications, we solve two conjectures by 
Erdos and Folkman, posed in sixties. 
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1. Overview 

One of the main tasks of additive number theory is to examine structural properties 

of sumscts. For a set A of integers, the sumset lA = A + ■ ■ ■ + A consists of those 
numbers which can be represented as a sum of I elements of A 

lA={ai + --- + ai\aieAi}. 

A closely related and equally interesting notion is that of I* A, which is the collection 
of numbers which can be represented as a sum of I different elements of A 



I* A = {ai H h ai\ai e Ai,ai ^ aj}. 
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Among the most well-known results in all mathematics are Vinogradov's theorem 
which says that 3P (P is the set of primes) contains all sufficiently large odd number 
and Waring's conjecture (proved by Hilbert, Hardy and Littlewood, Hua, and many 
others) which asserts that for any given r, there is a number I such that l*W {W 
denotes the set of r*'* powers) contains all sufficiently large positive integers (see 
[29] for an excellent exposition concerning these results). 

In recent years, a considerable amount of attention has been paid to the study of 
finite sumsets. Given a finite set A and a positive integer I, the natural analogue 
of Vinogadov- Waring results is to show that under proper conditions, the sumset 
I A {rA) contains a long arithmetic progression. 

Let us assume that A is a subset of the interval [n] = {1, . . . , n}, where n is a large 
positive integer. The concrete problem we would like to address is to estimate the 
minimum length of the longest arithmetic progression in lA {I* A) as a function 
of l,n and |^|. We denote this function by f{\A\J,n) {f* {\A\,l,n)), following a 
notation in [13]. Many estimates for f{\A\,l,n) have been discovered by Bourgain, 
Preiman, Halberstam, Green, Ruzsa, and Sarkozi (see Section 2), but most of these 
results focus on sets with very high density, namely |A| is close to n. Estimating 
f*{\A\,l, n) seems much harder and not much was known prior to our study. 

In this paper, we solve both problems almost completely for a wide range of I 
and 1^1 . Our study reveals a surprising fact that the fimctions f{\A\,l,n) and 
f*{\A\,l,n) are not continuous and admits a threshold rule. We have successfully 
located the threshold points within constant errors and established the asymptotic 
behavior of the functions between consecutive threshold points. It has also turned 
out, during our study, that the sum I* A is indeed fundamentally harder to attack 
than its counterpart lA. 

Center to our study is the development of a new, purely combinatorial, method. 
This method is totally different from harmonic analysis methods used by most 
researchers and seems quite flexible. For instance, it is easy to extend our results 
in many directions. Moreover, the method carries us far beyond our original aim 
of estimating lengths of arithmetic progressions, leading to more general theorems 
about proper generalized arithmetic progressions. 

Our results also have some interesting applications. In particular, we settle two 
forty year old conjectures of Erdos [8] and Folkman [14] (respectively) concerning 
infinite arithmetic progressions. 

Let us now present a brief introduction to the content of our paper: 

• In Section 2 we present the notion of GAPs and state Freiman's famous 
inverse theorem, both of which play a crucial role in our study. In Section 
3, we first describe some earlier results on the topic. Next, we present a 



LONG ARITHMETIC PROGRESSIONS IN SUMSETS: THRESHOLDS AND BOUNDS 5 

construction which suggests a conjecture about the length of the longest 
arithmetic progression in lA. It would be important to keep this construc- 
tion in mind as it motivates lots of our arguments later on. The first main 
result of Section 3 confirms the conjecture motivated by the construction. 
This result, among others, reveals the surprising fact that f{\A\J,n) is 
not continuous and admits a threshold behavior. There are many thresh- 
old points and we are able to locate them within a constant factor. The 
second main result, which refines the first one, provides a more general 
and complete picture. We can prove that I A not only contains long arith- 
metic progressions, but also contains large proper generalized arithmetic 
progressions (a regular arithmetic progression is a special proper gener- 
alized arithmetic progression of rank one; we shall use short hand GAP 
for generalized arithmetic progression). In the next section, Section 4, we 
prove these two results. The first four subsections of Section 4 are devoted 
to the development of a variety of tools, through which we could establish 
a connection between our study and inverse theorems of Preiman type. Ex- 
ploiting this connection, we complete the proofs in the final two subsections. 
This concludes the first part of the paper. 

• The second part of the paper consists of two sections, Section 5 and Section 
6. In Section 5, we generalize the results in Section 3 to sums of different 
sets. Instead of considering I A, we consider the sum Ai + ■ ■ ■ + Ai, where 
\A\i = ■ ■ ■ = \Ai\ = \A\. Thanks to the flexibility of our method, we can 
extend the results of Section 3 to this setting in a relatively simple man- 
ner. Also in this part we discuss an application which settles a conjecture 
posed by Folkman in 1966. This conjecture was considered by Erdos and 
Graham ([9], Section 6) the most important problem in the study of sub- 
complete sequences. An infinite sequence is subcomplete if its partial sums 
contains an infinite arithmetic progression. Folkman conjectured that a 
sufficiently dense sequence of positive integers (with possible repetitions) is 
subcomplete. In Section 6, we first work out a sufficient condition for sub- 
completeness and next use the results in Section 5 to show that a sufficiently 
dense sequence should satisfy this condition. 

• Sections 7, 8 and 9 form the third part of the paper. This part contains our 
strongest result whose proof is also the most technical. The heart of this 
part is Theorem 7.1, which extends the results in Section 3 to the sumset 
I* A. The proof comprises several phases. In the first phase, we prove a 
structural property of a set A where I* A does not contain a generalized 
arithmetic progression as large as we desire. This property, which might 
be of independent interest, shows that such a set A contains a very rigid 
subset which almost looks like a generalized arithmetic progression. The 
verification of the structural lemma occupies most of Section 7. Section 8 
contains the rest of the proof, whose core consists of an observation about 
proper GAPs (subsection 8.3) and a variant of the so-called tiling technique, 
introduced in an earlier paper [28]. Section 9 discusses a conjecture of 
Erdos (posed in 1962) which is related to the above mentioned conjecture 
of Folkman. This conjecture was proved in an earlier paper [28] using a 
special case of the main result in Section 7. but here we give a shorter 
proof using the general condition worked out in Section 6. Several other 
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applications of the main result of this part will appear in future papers 
[30, 31]. 

• The last part of the paper contains Section 10, in which we extend all 
previous results to finite fields. We assume that n is a prime and consider 
arithmetic progressions modulo n. This modification will lead to a natural 
change in the statement of the results, but the proofs remain basically the 
same. We conclude this part by mentioning an application concerning the 
problem of counting zero-sum-free sets. 

The paper contains several new technical ingredients, some of which (such as the 
study of proper GAPs in Sections 3 and 8 and the rank reduction argument used 
in Sections 3 and 4) would be of independent interest. Our writing benefits from 
two earlier papers [27, 28], which established several partial results and launched 
the foundation of our study. Many ideas from these two papers will be used here, 
frequently in more general and more comprehensible forms. 

2. Inverse Theorems 

A generalized arithmetic progression (GAP) of rank d is subset Q of Z of the 
following form {a + X^iLi ^i^-il^ ^ ^ n-i}', the product Y[i=i '^^ i^s volume and 
we denote it by Vo\{Q). In fact, as two different GAPs might represent the same 
set, we always consider GAPs together with their structures. The set (ai, . . . , Ud) 
is called the difference set of Q. 

Preiman's famous inverse theorem [12] asserts that if -|- < c\A\. where c 
is a constant, then A is a dense subset of a generalized arithmetic progression of 
constant rank. In fact, the statement still holds in a slightly more general situation, 
when one considers A + B instead of ^ + A. This was shown by Ruzsa [24] , who 
gave a very elegant proof which was different from Preiman's. 

Theorem 2.1. For every positive constant c there is a positive integer d and a 

positive constant k such that the following holds. If A and B are two subsets of Z 
with the same cardinality and \A + B\ < c\A\, then A is a subset of a generalized 
arithmetic progression P of rank d with volume at most k\A\. 

The most recent estimate on k (as a function of c) is due to Chang [5]. In our paper, 
however, we shall be more concerned with the best value of d (see Lemma 4.9 in 
Section 3). The following result is a simple consequence of Fremain's theorem and 
Pliineke's theorem (for the statement of Pliineke's theorem see, e.g., [24]). 

Theorem 2.2. For every positive constant c there is a positive integer d and a 
positive constant k such that the following holds. If A and B are two subsets of Z 
with the same cardinality and \ A+B\ < c\A\, then A + B is a subset of a generalized 
arithmetic progression P of rank d with volume at most k\A\. 

For the special case when c is relatively small, one can set d= 1. The following is 
a consequence of another theorem of Preiman [12]. 
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Lemma 2.3. The following holds for all sufficiently large m. If A is a set of 
integers of cardinality m and \A + A\ < 2.1m, then A is a subset of an arithmetic 
progression of length 1.1m. 

Again, we can replace A + Ahy A + B. The following is a corollary of a result by 
Lev and Smelianski (Theorem 6 of [19]). 

Lemma 2.4. The following holds for all sufficiently large m. If A and B are two 

sets of integers of cardinality m and \A + B\ < 2.1m, then A is a subset of an 
arithmetic progression of length 1.1m. 

Both Lemmas 2.3 and 2.4 are relatively simple and do not require the inverse 
theorem to prove. 



3. Long arithmetic progressions in lA 

3.1. Some previous results. Problems concerning arithmetic progressions in sum- 
sets are non-trivial and not too many results are known. In the following, we de- 
scribe some of the main results in this area. Bourgain [3] proved that if ^ = 5n 
where 5 is a positive constant then 2A contains an arithmetic progression of length 
gciog where e is a positive constant depending on S. Freiman, Halberstam and 
Ruzsa [10] consider sumsets modulo a prime and proved that 

Theorem 3.2. Let n be a prime and A a set of residues modulo n, \A\ = ^n, 
< 7 < 1 may depend on n. Let I be a positive integer at least 3. Then I A contains 
an arithmetic progression (modulo n) of length 0(772^''''^" ^'). 

Notice that Theorem 3.2 is stated for any 7, but it is really efficient only when 7 is 
relatively large. Indeed, if one wants to have 7nAT'^" ^' > 1 one needs to set 

Inn 

So Theorem 3.2 does not give a non-trivial bound in the case \A\= o(j^). Bour- 
gain's result and Theorem 3.2 have recently been improved by Green [16], but the 
applicable range does not change. 

Prior to our study, the only result (as we know of) which applies to sets with 
relatively small cardinality is the following theorem, proved by Sarkozy [25] . 

Theorem 3.3. There are positive constants c and C such that the following holds. 
If A is a subset of [n\ and I is a positive integer such that l\A\ > Cn, then lA 
contains an arithmetic progression of length cl\A\. 

Answering a question of Sarkozy, Lev [20] shown that one can set C equal to 2, 
which is the optimal value. 
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It is clear that Theorem 3.3 is sharp, up to a constant factor. Let A be the set of 
all positive integers from 1 to |^|. Then I A is the set of all positive integers from I 
to l\A\. 

The main result of this section gives a sharp estimate for a wide range of \A\ and 
/. including Theorem 3.3 as a special case. More importantly, our proof reveals the 
structures of those sets A whose sumsets lA do not contain a very long arithmetic 
progression. In the next subsection, we describe the construction that motivates 
our result. 

To conclude this subsection, let us mention that the proofs of all results mentioned 
in this paper, with the exception of Sarkozi's proof, are analytic, making heavy use 
of harmonic analysis, and are very different from the proofs in this paper. 

3.4. Sudden jumps. Our first crucial observation is that the statement of The- 
orem 3.3 stops to hold when l\A\ becomes a little bit less than n. The following 
construction shows that there is a set A c [n] and a number I such that l\A\ « n/A 
while the length of the longest arithmetic progression in lA is only 0(Z|^|^/^) (here 
and later w means "approximately"). 

The construction. Let A = {piXi + P2X2\^ < xi < m}, where pi « p2 ~ 5^ are 
two primes and P2 > m. It is convenient to think of A as a square in the two 
dimensional lattice Z^. A point {xi,X2) corresponds to the number piXi +P2X2- It 
is easy to show that this correspondence is one to one. Indeed 



PlXi + P2X2 = P\X\ + P2X'^ 

implies that 



P\{xx - x'^) = P2(x'^ - X2) 

which is impossible because of divisibility and the fact that \x\ — xW < m, < p2- 
Thus, l^l = w?. Let / = (4_|_"-)|^| = (4-|-e)m^ ' '^lisre e is an arbitrary positive 
constant. We have 



I A = {piXi + P2X2\l ^ xi < Im). 

Let P be an AP in lA, we are going to show that the coordinates of the elements 

of P also form an AP of the same length. Thus \P\ is at most the length of an edge 
of l\A\, which is less than Im = l\A\-^^^. Observe that 

P2 ~ n/2m > 2lm since I = n/{A + e)\A\ = n/(4 + e)m^. 
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Consider three consecutive terms in P, z + z = 2z' . Write z = pixi + P2X2- We 
have 

{piXi +P2X2) + {piX^ +P2x'2) = '2{pix[ +P2X'2), 

which imphes 

pi{xi + - 2x[) = -P2{x2 + X2 - 2x2), 
which is again impossible as 

\xi + Xi — 2x\\ < 2lm < P2- 

Next, we generalize the above construction to higher dimensions. 

The general construction. Let d be a constant positive integer at least 2 and 6 be 
a small positive constant. Consider two numbers \A\ and I satisfying < 
^5^n. Wc shall construct a set A of cardinality \A\ such that the longest arithmetic 
progression in I A has length 

Set a = L^^pwJ and b = li^^^^^^'^-'^'' \- Set &i = 0,62 = 1 and if d > 3 then 
set bi = [6('-2)/(rf-i)j for all 3 < i < d. Finally set = a + 6,. It is a routine to 
veriiy that for a sufficiently large n 

(l-^/3)a^/(''-^) >2;|A|i/''. (1) 

Consider the set 

d 

A = {Y,a^x,\'^<Xi<\A\^/''} 

i=\ 

(for convenience we assume that | Aj^/'^ is an integer). The term in the definition 
of a guarantees that P is a subset of [n] . It is convenient to view both A and of I A 
as d-dimensional integral boxes. The edges of lA form arithmetic progressions of 
length Similar to the case d = 2, we are going to prove the following two 

claims. 

Claim 3.5. lA does not contain an arithmetic progression of length larger than 



Claim 3.6. The cardinality of A is \ A\. 
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Proof of Claim 3.5. Consider an aritlimetic progression P in I A and let z,z ,z be 
throe consecutive elements of P'. We have z + z = 2z . Write z = atXi, z' = 

Y^f^i ciiX^ and z — X]f=i ^i^i ' follows that X]f=i(^i + ^ 2a:;j)aj = 0. Notice 
that 1 < Xi,x[,x'- < so \xi+x'- - 2x^1 < 2l\A\'^/'^, for ah i's. 

Next, we show that the diophantine equation X^iLi ''^i'^i = cannot have non-trivial 
roots with small absolute values, namely, \ri\ < 2l\A\^^'^ cannot hold simultaneously 
for all i's. Consider a non-trivial root {ri, . . . ,ra}. There are two cases 

(I) J2i=i = 0. By the definition of the a^'s, it follows that Yli^i '''i^i = and d 
should be at least 3. Let j be the largest index where rj ^0, it is easy to see that 
j > 3. On the other hand, by the definition of the 6i's, for any j >3 



'^'^'^ EiZlh- -II' 
where the last inequality is from (1). 
(II) J2i=i 7^ 0- t^is case, it is obvious that 



maxjnl > > (1 - 5)aV(''-i) > 2^^'^^. (3) 

By the previous facts, we can conclude that Xi + x^ — 2x^ = for all i's. So for 
each i, the coordinates of Zi form an arithmetic progression. This implies that the 
length of P could be at most the length of the "edges" of A, which is □ 

From the previous proof, it is obvious that if X^f^j^ aiXi = X^iLi '^i^'i 1 ^ Xi,x[ < 
l^lVd for all 1 < z < d, then Xi = x\ for all i's. This implies that the cardinality of 
A is \A\, proving Claim 3.6. 

This construction plays a very important role in the whole paper. It not only leads 
us to the statements of our theorems, but also motivates many of our arguments. 

The sudden jumps. For the sake of simplicity, let us consider I and n fixed and 
view f{\A\,l,n) as a function of |A| (we call this function (7(|A|)). The special case 
d=2 shows that if \A\ < , then ^d^l) is upper bounded by l\A\^/'^. This and 
Theorem 3.3 imply that ,9(|j4|) admits a dramatical change in order of magnitude 
somewhere near the point j. If \A\ > Cj for some sufficiently large constant C, 
then ffd^l) (up to a multiplicative constant) behaves like l\A\. On the other hand. 
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if \A\ < then then g{\A\) is upper bounded by This indicates that 

5(1^1) is not a continuous function and its behavior must follow a threshold rule. 

The general construction suggests that n/l is not the only threshold (a place where 
(7(|yl|) jumps). Assume, for a moment, that we could prove that close to the left of 
n/l, 5(1^1) behaves like ZjAj-'^/^. This behavior, however, cannot continue to hold 
with \A\ getting significantly smaller than n/l. Indeed, once |^| becomes less than 
i^jl then gd^l) is upper bounded by Thus, another threshold should 

occur around the point j|. Motivated by this reasoning, one would conjecture that 
there is a threshold around for any fixed positive integer d. To the right of the 
threshold, g{\A\) behaves like l\A\'^/'^, while to the left it behaves like l\A\'^/'^'^+'^h 

3.7. 5(1^1) must jump. Our first main result confirms the above conjecture. 

Theorem 3.8. For any fixed positive integer d there are positive constants C and 
c depending on d such that the following holds. For any positive integers n and I 
and any set A C [n] satisfying l'^\A\ > Cn, lA contains an arithmetic progression 
of length d|A|Vrf. 

Corollary 3.9. For any fixed positive integer d there are positive constants Ci, C2, 
ci and C2 depending on d and e such that whenever < |^| < 

c^l\A\^l'' < f(\A\,l,n) <C2l\A\"''. 

Let us again consider f{\A\J,n) as a function of \A\, assuming n and I are 

fixed. It is more convenient to view gd^j) on a logarithmic scale. For this purpose, 
let us define x = In \A\ and y(x) = In (/(|^|). Corollary 3.9 implies 

Corollary 3.10. For any fixed positive integer d there are constants Ci, C2, ci and 
C2 depending on d such that whenever Inn — dlni + Ci < x <lnn— (d— l)ln/ + C2 



—a; + In / + ci < y{x) < —x + lnl + C2. 

The values of the constants Ci,C2,ci,C2 in this corollary are, of course, diff'erent 
from the values of Ci,C2,ci,C2 in Theorem 3.8. Corollary 3.10 determines the 
value of y{x) up to a constant additive term for all x except few intervals of constant 
lengths. An exceptional interval is a neighborhood of a threshold point \nn—dlnl = 
In p- and is of the form [Inn — d\nl + C2{d — l),\nn — d\nl + Ci{d)], which has 
length Ci{d) — C2{d— 1). Here we write Ci{d) and C2{d— 1) instead of Ci and C2 
to emphasize the dependence on d and d — 1, respectively. 

The above results locate the thresholds within constant factors. It would be nice 
to find the exact locations of these thresholds. 



Question. Find the exact values of the constants C and c in Theorem 3.8. 
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The case d = 1 was treated by Lev in [20]. For general d, our construction shows 
that C{d) is at least (1 - o(l))/2d. 



3.11. A stronger theorem about generalized arithmetic progressions. The- 
orem 3.8 is a only a tip of an iceberg and we are going to extend it in various 
directions. In the first extension, we show that Theorem 3.8 is a consequence of a 
stronger theorem about GAPs. 

In order to guess what we may say about the possible existence of GAPs in lA, let 
us go back to the construction. Observe that the constructed sumset lA contains 
not only an arithmetic progression of length but also a proper GAP of rank 

d and cardinality r2(/''|j4|). The arithmetic progression of length wc talked 

about is actually an edge of this GAP. Thus, our first guess is, naturally, that I A 
contains a GAP of rank d and cardinality ri(^''|^|). This guess is, nevertheless, 
false. To see this, notice that if we let A in the construction be a GAP of dimension 
d' < d with appropriate parameters, then I A is a GAP of dimension d' of cardinality 
Vt{l''' \A\) which is much less than A|) (it is interesting to note that in this case 
lA contains an arithmetic progression of length > n{l\A\'^''^)). So, the 

strongest statement one could say is that lA contains a proper GAP of rank d' and 
cardinality \A\) for some integer 1 < d' < d. This turns out to be the truth. 

Theorem 3.12. For any fixed positive integer d there are positive constants C and 
c depending on d such that the following holds. For any positive integers n and I 
and any set A C [n] satisfying l'^\A\ > Cn, I A contains a proper GAP of rank d' 
and volume at least cl'^ \A\, for some integer 1 < d' < d. 

The other main results of this paper. Theorems 5.1, 7.1, 8.13, 10.3 are extensions 
of this theorem in various directions. 

To conclude this subsection, let us point out that both Theorem 3.8 and Theorem 
3.12 are invariant under affine transformations. Instead of assuming that ^ is a 
subset of [n], we can assume that ^4 is a subset of an arithmetic progression of 
length n. In fact, for technical reasons, we will frequently assume that A contains 



3.13. More about generalized arithmetic progressions. Consider a GAP 
Q = {a + J2i=i 3:iai\0 < Xi < Hi}. It is convenient to consider Q together with the 
box Bq = {(xi, . . . , Xd)\ < Xi < n,} of d dimensional vectors and the following 
map $ from Z'^ to Z 



0. 



d 




i=l 
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The volume of Q is the geometrical volume of the d-dimensional box spanned by 
Bq 



d 

Yol{Q)=Vo\{BQ)=l[ni. 

i=l 

We say that Q is proper if ^{Bq) is injective. In this case the cardinality of Q is 
njLi('^i + 1) = \Bq\- It is trivial that 



IQI < 2'^Vol(Q), (4) 

and if Q is proper then 



VoI(Sq) < \Bq\ < 2^Yo\{Bq). (5) 



If Q is not proper, then there are two vectors u and w in Bq such that <I>(w) — ^{w). 
The vector v — u — w is called a vanishing vector. By linearity, it is clear that if v 
is vanishing then ^{v) =0 and ^{v + u) = for any u gZ'^. 

In the following we specify some rules used in calculation involving GAPs. 

Addition. We only add two GAPs with the same difference set and the result is a 
GAP with this difference set. For instance, if P = {a+aia;i + . . . adX(i\0 < Xi < mi} 
and Q = {b + a\X\ + . . . 0^x^10 < a;^ < rij} then 



P + Q = {{a + h)+ aixi + ... adXd\Q < Xi <mi+ nj. 
Substraction is defined similarly. 

Multiplication. For a GAP P, we have 2P = P + P and IP = {I - 1)P + P. 

Division. Consider a GAP P = {a + o^xi + . . . adXci\0 < Xi < m,}. We say P is 
normal if a = 0. In this case, we define 



-P = {aixi + . .. adXd\0 < Xi < nii/s}. 
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All of our arguments concerning GAPs are invariant with respect to affine transfor- 
mation (shiftings in particular), so we could (and shall) automatically assume that 
a GAP is normal when it is involved in division. 



3.14. Some simple tricks. In this subsection, we describe several simple tricks 
which we use frequently throughout the paper. 

As C can be set arbitrary large, we can sacrify constant factors in many arguments. 
So wc arc going to make several assumptions, whose "prices" are only constant 
factors, which are very convenient for the proofs. 

Divisibility. By increasing the value of C, we can assume that I is a power of two. 
Indeed, if wc replace I by the closest power of two, then the magnitude of I decreases 
by at most 2. Similarly, once we have a GAP of constant rank and all we care is the 
volume of this GAP, up to a constant factor, then we can assume that the lengths 
of the edges are divisible by 2 (or by any fixed integer). This latter assumption is 
convenient for divisions. For instance, whenever we need to divide a GAP P by a 
constant s, we shall always assume that the lengths of the edges of P are divisible 
by s. 

Passing to subsets. In many situations, it is useful to assume that a certain set, 
say X, has a certain property. On the other hand, we can only prove that X has 
a subset X' with the desired property. However, when X' has constant density in 
X, we can frequently assume that X has the desired property, again by increasing 
the value of C. 

A graph with small degrees contains a large independent set. A graph consists of 
a set V of vertices and a set E of edges, where an edge is a pair of two different 
vertices. The degree of a vertex v is the number of edges containing v. If (w, v) is an 
edge, then u is a neighbor of v and vice versa. A subset of V is called independent 
if it does not contain any edge. We are going to use the following simple fact from 
graph theory. 

Fact 3.15. Let G be a graph on n vertices. Assume that any vertex ofG has degree 
at most d. Then G contains an independent set of size n/{d + 1). 

Proof. Let / be a maximal independent set. Since / is maximal, the neighbors of 
the vertices in / and / together cover the vertex set of G. Since the vertices of / 
have at most d\I\ neighbors, it follows that 



d\I\ + \I\ > n. 



proving the claim. 



□ 
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The above fact implies that if G does not contain an independent set of size s, then 
G has a vertex with degree at least n/s. 

4. Proofs of Theorem 3.8 and Theorem 3.12 

This section has six subsections. In the first four subsections we develop a variety 
of tools. The proof of Theorem 3.8 and that of Theorem 3.12 are presented in the 
last two subsections. 

Let us start with a sketch of the proof of Theorem 3.8. Consider the sequence 

(without loss of generality we can assume that I is a power of 2). Since I A is a subset 
of the interval [In], \IA\ is at most In. This implies that the ratio |2'+iA|/|2M| 
cannot always be large. In particular, there is a constant K such that |2'+^A| < 
ii'|2M| holds for some index i less than log2 1. On the other hand, 2'+^A = 2'A + 
2M, so by applying Frciman's theorem we can deduce that 2M is a dense subset 
of a GAP P with constant rank. 

Let us assume, for a moment, that 2' A has density one in P, namely, 2' A = P. Thus 

2M contains a long arithmetic progression B of length at least (VolP)-'^/'"""'^^^). As 
i is less than logj I, I A contains an even longer arithmetic progression of length at 
least ^\B\. 

In order to carry out this scheme, wc first need to show that assuming T" A = P 
is not oversimplifying. This will be carried out in the second subsection, where we 
show that at the cost of constant factors we can think of a dense subset of a GAP 
as the whole set. 

With the aid of this assertion, it is now not so hard to prove that I A contains an 
arithmetic progression of length l\A\'^ for some small e. In order to optimize e, we 
need to optimize K and the rank of P. The optimal value of K is easy to guess 
while the optimal value of the rank of P will be provided by a result of Bilu [2] , 
which is a part of his proof of Preiman's theorem. 

Now comes the last, and perhaps most intriguing point. Even with these optimal 
parameters, we could not obtain the bound claimed in the theorem (however, we 
can obtain a weaker theorem proved in an earlier paper [27]). To fill in the gap, we 
need to prove certain properties of non-proper and proper GAPs. These properties 
lead us to Lemma 4.13 which is the main lemma of the proof. The verification of this 
lemma requires the preparation carried out throughout the first three subsections. 

Now let us say something about the proof of Theorem 3.12. The first step is to 
realize that we can assume that 2^ A is not only a GAP, but also a proper one. 
The sumset I A contains a multiple of this GAP. The trouble is that a multiple of a 
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proper GAP does not need to be proper. What saves us here is a technique called 
"rank reduction". The heart of this technique is an argument which shows that 
under certain circumstances a multiple of a proper GAP either is proper or contains 
a proper GAP of strictly smaller rank and comparable cardinality. Thus if we fail 
to complete our task in the first attempt, we can pass to a proper GAP with smaller 
rank and make a new try. The GAP we start with has a constant rank so sooner 
or later we must be done. The reader would notice that this approach, in spirit, is 
consistent with the statement of Theorem 3.12, which confirms the existence of a 
GAP of rank d! where d' is an undetermined quantity between 1 and d. This value 
d' is exactly where the rank reduction terminates. 

4.1. A property of non-proper GAPs. Let us consider the ratio between the 
cardinality and the volume of a GAP P. Assume that P has the form P = {a + 
aixi + . . . Orf.TdlO < Xi < Hi}, where all n'^s > 1. The volume of P is Y[i=i • If P 
proper, then its cardinality is Y[i=i{^i~^^) ^^^1 the ratio in question is ni=i(-'- + ;r)' 
which is a number between 1 and 2'^. For a non-proper GAP, it is safe to say that 
the ratio is less than 2**, but it could still be larger than 1. Wc; are going to show, 
nevertheless, that if P is a sufficiently large multiple of a non-proper GAP, then 
this ratio is bounded from above by any fixed positive constant e. 

Lemma 4.2. For any positive constants e and d there is a constant g such that 
the following holds. If a GAP Q of rank d is not proper, then \gQ\ < eVol{gQ). 

Moreover, 



In the proof, we are going to use terminologies introduced in subsection 3.13. The 
reader may want to read this subsection again before checking the proof. 

Proof of Lemma 4.2. We can assume that Q = {xiOi -I- • • • -I- Xdad\0 < Xi < n^}. 
We consider Q together with the box Bq and the canonical map <& from Bq to Q. 
Since Q is not proper, there is a vanishing vector v where — Uj < Vi < Ui for all 
i = 1,. . . ,d. Without loss of generality, we can assume that the first d' coordinates 
of V is positive and the remaining ones are non-positive. Thus < < n, for 
i = 1, . . . ,d' and — rij < < for d' < i < rf. 

Let h < g he sufficiently large integers and let B' be the set of vectors w in 
gBg such that w + v,w + 2u,...,w -|- hv are also in gBq. As v is vanishing 
= + v) = ■■■ = <^{w + hv). It follows that 



\2Q\ < (1- 



1 



)|2Bq 



2d+i 



\gQ\ < \gBQ\B'\ + 



1 



\B'\ = \9Bq\ 



h 




(6) 



h + 1 



h + 1 



which implies 
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where in the last inequaUty we use the trivial fact that IgBg] < 2''-Yo\{gBQ) (see 
(4). Next we bound \B'\ from below. A vector w is surely in B' if < < {g — h)ni 
for i < d! and hni < Wi < grij for d' < i < d. Thus the cardinality of B' is at least 
UUi iia - h)ni + l). Moreover, \gBQ\ < UUiidni + 1), so 



h \B'\ ^ h A ((fl - h)n, + 1) 
For any given e, ti we could choose g and h (depending only e and d) so that 



holds for any positive integers n^'s. With this choice of g and h, the right most 
formula in (7) is thus at most eVol{gQ), proving the first statement of the lemma. 
To verify the second statement, set 5 = 2 and h = l. We obtain 



The product HiLi 2n^i ^SiVgev than ^ so it follows that 



|2g|<(i-^)|2i?Q|, 



completing the proof. 



(10) 
□ 



4.3. The proper filling lemma. In this subsection, we present several lemmas 

which allow us to think of a dense subset of a GAP as the whole set, at the cost of 
constant factors. The first such lemma was proved in [27]. 

Lemma 4.4. For any positive constant 7 and any positive integer d there is a 
constant positive integer h and a positive constant 7' depending on 7 and d such 
that the following holds. If P is a generalized arithmetic progression of rank d 
and B is a subset of P such that \B\ > 7Vol(P), then hB contains a generalized 
arithmetic progression of rank d with cardinality at least ^\B\. 
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We call this lemma the "filling lemma" , as our motivation is to fill out a complete 
GAP. Next, we strengthen this lemma by adding a requirement that the GAP 
contained in hB must be proper. 

Lemma 4.5. For any positive constant 7 and any positive integer d there is a 
constant positive integer h and a positive constant 7' depending on 7 and d such 
that the following holds. If P is a generalized arithmetic progression of rank d and 
B is a subset of P such that \B\ > 7Vol(P), then hB contains a proper generalized 
arithmetic progression of rank d with cardinality at least 7'|-B|. 

We shall, naturally, refer to Lemma 4.5 as the "proper filling lemma". The proof 
of Lemma 4.5 combines Lemma 4.4 with the result of the previous subsection. 

Proof of Lemma 4.5. By Lemma 4.4, hB contains a GAP Q with cardinality 

f2(|i3|). It suffices to show that Q contains a proper GAP of the same rank with 
cardinality n{\Q\). As h = 0(1), Vol(/iP) = 0(Vol(P)) = 0{\B\), so we can assume 
that 



\Q\ > 7iVol(/iP) (11) 

for some positive constant 71. 

Let g he a large constant integer. Without loss of generality we can assume that 
Q = {xiai + . . .Xdad\0 < Xi < m} and m is divisible by g. Let e be a positive 
constant smaller than 71 and consider the GAP Q' = ^Q. If Q' is proper then we 
are done as 

lO'l > Voi(o') = n(Voi{Q)) = n{\Q\). 

We next show that Q' is indeed proper given that g is sufficiently large. Assume 
otherwise. Choosing g as in Lemma 4.2 we have 

IQI = IgQ'l < eVol(ffQ') = eVol(Q) < eVol(/iP) < 7iVol(/iP), (12) 
which contradicts (11). This completes the proof. □ 

4.6. {S, (i)-sets. We begin this subsection with an important definition. 

Definition 4.7. A set A is a {5,d)-set if one can find a GAP Q of rank d such 
that B = QnA satisfies \B\ > 5max{|A|, Vol((5)}. 

The filling lemmas tell us that a {6, d)-set (where both 5 and d are constant) can 
be treated as a GAP of rank d, if we are allowed to sacrifice constant factors. 
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Lemma 4.8. For any positive constants 5 and d there are positive constants g and 
7 such that the following holds. If A is a {8,d)-set then gA contains a proper GAP 
of rank d with cardinality at least 'y\A\. 

Now we are going to present another lemma, which supplies a sufficient condition 
for a set to be a ((5, (i)-sct. In order to motivate this lemma, let us go back to 
Freiman's inverse theorem. Freiman's theorem shows that if |A + A| < c\A\, then A 
is a dense subset of a GAP P of rank d = d{c). As we mentioned at the beginning 
of this section, the optimal value of d is critical to us. Observe that if ^ is a proper 
GAP of dimension d, then |A + A| < 2''|^|. So, one may wonder whether one can 
set d = [log2 cj . Unfortunately, Freiman's theorem is not true with this value of d 
(the best known bound is d = [cJ ). On the other hand, if we can afford to sacrifice 
constant factors, then we can actually obtain this optimal value of d. To be more 
precise, if |A + j4| < c^I, then a constant fraction of A is contained in a GAP P of 
ranked d = [logj cJ with small volume. The following lemma is a consequence of 
Theorem 1.3 of [2]. 

Lemma 4.9. For any positive constants e and d there is a positive constant 6 such 
that the following holds. If \A + A\ < {2'^ — e)\A\ then A is a {S, d)-set. 

This lemma is a co-product of the proof of Freiman's theorem given by Bilu in [2] . 

4.10. Rank reduction. Now we are in position to develop the so-called rank re- 
duction technique, mentioned earlier in the beginning of this section. This technique 
plays an important role not only in the proofs of Theorems 3.8 and 3.12, but also 
in the proof of Theorem 7. 1 . 

The rank reduction technique allows us to pass from one GAP to another which has 
strictly smaller rank and comparable cardinality. We are going to present several 
lemmas which constitute the technique. The first lemmas is as follows. 

Lemma 4.11. For any positive constant d there is a positive constant 5 such that 
the following holds. If a GAP Q of rank d is proper but 2Q is not, then 2Q is a 
((5, d — l)-set. 

Proof of Lemma 4.11. Applying the second statement of Lemma 4.2 to 2Q we 
have that 



|4Q| = |2(2Q)| < (1 - ^)|4Bq| < (1 - ^^'\Bq\, (13) 

where in the last inequality we used the fact that |4Bq| < 4:'^\Bq\. Since as Q is 
proper \Bq \ = \Q\. It follows that 



m<{i-^^'\Q\<{2''-^r\Q\, 



(14) 
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for some constant 7 = 'y{d). It follows that either \2Q\ < {2'^ — ^)\Q\ or |4Q| < 
{2'^ — -f)\2Q\. In the first case Q is a {S,d — l)-set; in the second case 2Q is a 
{S,d— l)-set (both statements follow immediately from Lemma 4.9). But Q is a 
translation of a subset of 2Q, so in both cases 2Q is a {S,d— l)-set (notice that the 
three 5's in the last two sentences might have different values). □ 

The previous lemma and Lemma 4.8 together yield 

Lemma 4.12. For any positive constant d there are positive constants g and 7 
such that the following holds. If a GAP Q of rank d is proper hut 2Q is not, then 
gQ contains a proper GAP of rank {d — 1) with cardinality at least 7|(5|- 

We are now ready to present the main lemma of the proofs of Theorems 3.8 and 
3.12. 

Lemma 4.13. For any positive constants e and d there are positive constants c and 
7 such that the following holds. Let Q be a proper GAP of rank d and assume that 
there are positive integers li = 2"^ and m satisfying liQ C [m] and lf\Q\ > cm. 
Then there is a positive integer I'l = 2**! < li such that l[Q contains a proper GAP 
Q' of rank {d - 1) where \Q'\ > ^hfd\Q\. 



Proof of Lemma 4.13. Consider the sets Qo = Q, Qi — 2(5i_i, for « = 1, . . . , si — 
hi = S2, where hi is the largest integer satisfying 2'"'i+'' < c. If Qi was proper for 
all i, then \Qi\ > Yol{Qi) and Vol((5i) = 2'*Vol((3i_i) and this would imply that 



> voi(g.,) = 2-^Voi(Oo) > > > ^ > 

(15) 

which is impossible as we assume liQ C [m]. (In the second inequality we used 
the fact that Vol(Qo) = Vol(g) > J^.) Therefore, there is some i between 1 and 
S2 for which Qi is not proper. Let j be the smallest such i. Thus, Qj-i is proper 
and Qj = 2Qj^i is not. By Lemma 4.12, there are constants /12 and 71 such that 
h2Qj-i contains a proper GAP Q' of rank (d—l) with cardinality at least 7i|Qj_i|. 
Without loss of generality we can assume that /12 is a power of 2, /12 = 2^^. By 
increasing c, we can assume that hi > /13 which guarantees that l[ ~ h22^ < li. 
The set l[Q = /i2Qj-i contains a proper GAP Q' of rank {d — 1) and cardinality 



\Q'\ > ii\Qj-i\ = ii2^^-'^'\Q\ > j^J'i'\Q\ = jl'M, (16) 

where 7 = , concluding the proof. □ 

4.14. Proof of Theorem 3.8. Before starting the proof, let us mention that all 
constants (71, 72 etc) in the proof depend on d, but do not depend on C. By setting 
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C sufficiently large, we can satisfy all relations required between these constants. 
Without loss of generality, we can assume Z is a power of two, I = 2% where s is 
sufficiently large. Consider the set sequence Ao = A, Aj+i = 2Ai. We first need the 
following fact, which asserts that for some i significantly smaller than s = log2 Z, 
the ratio is not too large. 

Fact 4.15. There is some i < gq^s such that \ Ai\ < 2''+^/^|^j_i|. 
Proof of Fact 4.15. Assume otherwise, then 



1^ d+i J > 2^'^+^/^^™"|Ao| = 2(<^+i)^|A| = Z'^+Vl > Cln, (17) 

d+3/2 

a contradiction as A d+i „ is a subset of \ln] (C is set to be larger than 1). The 

d+3/2 ^ L J \ O / 

proof of the claim is completed. □ 

Let si be the first index where < 2'^+^^^\As^\. Lemmas 4.8, 4.9 and 4.5 

imply that there are constants gi and 71 depending only on d such that 'i^^Ag^ 
contains a proper GAP Q of rank d + 1 and cardinality at least 7i|AsJ. By the 
definition of si 

1^1 I > 2('^+3/2)«i|yl|, 

SO 

IQI >7i2('^+3/2)sim_ 

By setting C sufficiently large, we can assume that s is sufficiently large so that 
s > Si + gi (notice that si < ^^gj^^ s). This implies that 2^1+91 Q is a subset of lA. 

Next we apply Lemma 4.13 to Q with m = In, h = -^r^rgT' d+1 instead of d. 
In order to verify the conditions of this lemma, observe that 



Again by assuming that C is large, we could guarantee that the condition of Lemma 
4.13 is met. Lemma 4.13 implies that we have a proper GAP Q' C l[Q = 2^^Q of 
rank d with cardinality at least 



^22<^d+i)^Q^ > 727i2('^+^)(«i+*'i)2*i/V|, 



(19) 
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where si + s[ < s. The GAP P = 2^-"i-"ig' is a subset of 2M = lA and its 
volume is 



= 7i722'''^m23^i/2+^i-'* 



Since P has rank d, its longest edge forms an AP of length at least 

completing the proof of Theorem 3.8. □ 

Remark. The reader may notice that in this proof we used the estimate on the 
cardinality of Q', but did not use the fact that Q' is proper. The properness of Q' , 
however, is critical in the next proof. 



4.16. Proof of Theorem 3.12. Without loss of generality, we can assume that 
G A. Consider Q' as in the proof of Theorem 3.8. Again by increasing C, we 
may assume that s — si — s'l is lower bounded by a sufficiently large constant. 
Consider the GAP Q = 2^^^^^^^~3^Q\ where g2 is a large constant satisfying 
s — si — s'l — g2> 0. Since Q G A, Q is a subset of lA. Moreover, as Q' and Q 
are of ranked d, we have, using inequality (19), that 



Vo1(Q") > 2(^^"i""'i"^'^)'^727i2(''+^)("i+"'i)2*'i/V| 
= 7i722"''2^+*i-S2''|A| 



We are going to examine two cases: 



Case 1: Q is proper. In this case I A contains the proper GAP Q of rank d and 
volume r2(^''|>l|). So we are done by setting d' = d. 

Case 2: Q" is not proper. Now we make a crucial use of the fact that Q' is proper. 

The properness of Q' implies that there is a positive integer S2 < s — si — s'^ — 92 < s 
such that -^Q is proper. As usual, we choose S2 to be the smallest such an integer, 
which implies that ^sl-i Q" = -^Q" is not proper. Applying Lemma 4.12 to -^Q" 
we obtain a GAP Q " of rank d—1 and volume 
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^^(Vol(^Q")) = n{^Vol{Q") = n{^i'\A\)- 

Furthermore, there is a constant (73 such that Q = 2^^~b^Q is a subset of lA. 
The GAP Q has rank d — 1 and volume 

2(«2-s3)(d-i)vol(Q"') = 0(2«2(rf-i)J_;dm) ^ Ci{^2-'H^\A\). 

Since s > S2, 2''^ > 2"^ = r^. Thus, the volume of g"" is Q{l'^-^\A\). Now if g"" 
is proper then we are done by setting d' = d—1. Otherwise we repeat the analysis 
of Case 2 to obtain a GAP of rank d—2 and so on. This repetition cannot continue 
forever so sooner or later we must obtain a proper GAP of some rank d' < d which 
satisfies the claim of the theorem. □ 

5. Sums of different sets 

The goal of this section is to generalize the results in Section 3 by considering the 
sum of different sets, instead of the sum of the same sets. Given I sets Ai,. . . ,Ai, 
we define 



Ai + ■ ■ ■ + Ai = {ai + ■ ■ ■ + ai\ai e Ai,l < i < I}. 
We obtain the following generalization of Theorem 3.12. 

Theorem 5.1. For any fixed positive integer d there are positive constants C and 
c depending on d such that the following holds. Let Ai, . . . ,Ai be subsets of [n] of 

size \ A\ where I and \ A\ satisfy l'^\A\ > Cn. Then Ai-\ \- Ai contains a GAP of 

rank d' and volume at least cl'^ \A\, for some integer 1 < d' < d. 

The following corollary generalizes Theorem 3.8. 

Corollary 5.2. For any fixed positive integer d there are positive constants C and 
c depending on d such that the following holds. Let Ai,...,Ai be subsets of [n] 
of size \A\ where I and \A\ satisfy l'^\A\ > Cn. Then Ai + ■ ■■ + Ai contains an 
arithmetic progression of length cl\A\^/'^. 

Corollary 5.1 has a nice application. In Section 6, we use this corollary to confirm 
a conjecture of Folkman posed in 1966. 

5.3. The basic idea. The basic idea behind the proof of Theorem 5.1 is the fol- 
lowing. Given the sets Ai,. . .Ai as in Theorem 5.1, we are going to show that there 
are numbers n' and a set A' such that 
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• I' A' is a subset oi Ai + ■■■ + Ai: A' is a. subset of [n']. 

• l',n',\A'\ satisfy the conditions of Theorem 3.12. (This can be done by 
setting the constant C in Theorem 5.1 much larger than the constant C in 
Theorem 3.12.) 

• {l'f\A'\ = nil'^'\A\) for all 1 < d' < d. 

An application of Theorem 3.12 to the triple {l',n',A') immediately implies the 
statement of Theorem 5.1. 

The proof of Theorem 5.1 uses a technical lemma, Lemma 5.8 below. This lemma 
provides a sufficient condition for the existence of a sumset of form I' A' in a sumset 
of different sets. The verification of this lemma requires extensions of the filling 
lemmas described in Section 4. These extensions is the topic of the next subsection. 

5.4. Filling with different sets. In the proof of Theorem 5.1, we shall need the 
following lemma, which generalizes Lemma 4.4 the way Theorem 5.1 generalizes 
Theorem 3.8. This lemma was proved in an earlier paper. For the readers' conve- 
nience, we include the proof in Appendix A. 

Lemma 5.5. For any positive constant j and positive integer d, there is a positive 
constant 7' and a positive integer g such that the following holds. If Xi, . . . , Xg 
are subsets of a generalized arithmetic progression P of rank d and \Xi\ > 7 Vol{P) 
then Xi + ■ ■ ■ + Xg contains a generalized arithmetic progression Q of rank d and 
cardinality at least 7' Vol{P) . Moreover, the distances of Q are multiplies of the 
distances of P. 

One can further strengthen this lemma by requiring Q be proper. The proof is 
similar to the proof of proper filling lemma. Lemma 4.5. 

Lemma 5.6. For any positive constant 7 and positive integer d, there is a positive 
constant 7' and a positive integer g such that the following holds. If Xi, . . . , Xg 
are subsets of a generalized arithmetic progression P of rank d and \Xi\ > jVol{P) 
then Xi + ■ ■ ■ + Xg contains a proper generalized arithmetic progression Q of rank 
d and cardinality at least 7' Vol{P) . Moreover, the distances of Q are multiplies of 
the distances of P. 

Later on, we shall refer to Lemmas 5.5 and 5.6 as the general filling and general 
proper filling lemmas, respectively. 

5.7. The main lemma of Theorem 5.1. We arc now in position to present and 

prove the main lemma of the proof of Theorem 5.1. 

Lemma 5.8. For every positive constant c there are positive constants e and d 
depending on c such that the following holds. If the sets Xi, . . . ,Xi, each of cardi- 
nality \X\, satisfy \Xi + Xi\ < c\X\ for all2 < i < I, then there is a proper GAP Q 
of rank at most d and cardinality at least e\X\ and a number I' > el such that the 
sum Xi -\ + Xi contains a translation of I'Q. 
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Proof of Lemma 5.8. The condition \Xi + Xi\ < c\X\ and Freiman's theorem 
imply that Xi is contained in a GAP R with constant rank and volume 0(|X|). 
Consider Xj, for some 2 < i < I. We say that two elements x and y of Xi are 
equivalent \i x — y £ R — R. It is trivial that if x and y are not equivalent then 
x+X\ axidy+Xi arc disjoint sets. Since |Xi+X;[ < c:\X\ where \X\ = \Xi\ = \Xi\^ 
the number of equivalent classes is at most c. It follows that there is a class with 
cardinality r2(|X|); let us call this class Yi. Yi is a translation of a subset Zi (of 
constant density) of R. The hidden constants in the asymptotic notations depend 
on c. 

Consider the sets Z2,. ■ ■ ,Zi. These sets are subsets of R and \Zi\ > jVol{R) for 
some positive constant 7 depending on c. Let g be a large constant integer. With 
the exception of at most g — I sets, we partition the Zi's into h = [{I — l)/g\ 
disjoint groups of size g: Gi, . . . ,Gji. Thus each group Gj contains g sets, each 
of which is a subset of R with cardinality 7Vol(_R) for some positive constant 7. 
By setting g sufficiently large, the general filling lemma (Lemma 5.5) applies and 
shows that the sum of the sets in any group Gj contains a proper GAP Qj of 
cardinality r2(Vol(i?)). Moreover, the rank of Qj is the same as the rank of R and 
the differences of Qj are multiples of the differences of R. 

Since \Qj\ = fi(Vol(7?)), there are only 0(1) choices for the difference set of Qj 
(for the definition of difference sets, see Section 2). Thus, a constant fraction of the 
Qj^s has the same difference set. Without loss of generality, we may assume that 
these Qj's are Qi, Q2, • • ■ , Q^j where I2 = ilih)- 

Since \Qj\ = n{Vol{R)), the length of the hth edge of Qj is 51(1) times the length 
of the corresponding edge of R, for all 1 < /i < rank(i?). Thus the lengths of 
the hth edge of the Qj's are within a constant factor from each other, for all 
1 < fc < rank(i?). This implies that the intersection of the boxes Bq^, . . . , Bq^^ 
contains a box B with volume f2(Vol(i?)) (for the definition of these boxes, see 
subsection 3.13). Let mi, ... , be the lengths of the edges of B and (oi, . . . , Ud) 
be the (common) set of differences of Qi, . . . , Qi^. It follows that each of Qi, . . . , Qi^ 
contains a translation of the proper GAP Q = {aixi + . . . adXci\0 < Xi < rrii} {Q is 
proper because the Qj's are so). We have that 



\Q\ = \B\=n{Yol{R})^n{\X\), 

and 



I2 = n{h) = 



Moreover, a translation of I2Q is contained in Qi + . . . Qi^ and a translation of 

Qi + . . . Q12 is contained in Xi -\ + Xi. So Xi-\ \- Xi contains a translation 

of I2Q, completing the proof □ 
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5.9. Proof of Theorem 5.1. With the main lemma in hand, wc arc ready to 
conclude the proof of Theorem 5.1. In order to find a triplet {A' , /'. n'} as desired, 
we are going to apply the so-called tree argument. This argument was introduced 
in [28] and, in spirit, works as follows. Assuming that we want to add several 
sets Ai, . . . , Ai. We shall add them in a special way following an algorithm which 
assigns sets to the vertices of a tree. A set of any vertex contains the sum of the 
sets of its children. If the set at the root of the tree is not too large, then there is 
a level where the sizes of the sets do not increase (compared with the sizes of their 
children) too much. Thus, we can apply Freiman's inverse theorems at this level to 
deduce useful information. The creative part of this argument is to come up with 
a proper algorithm which suits our need. 

The reader has already met a simple version of this argument in the proof of The- 
orem 3.8. In that proof, the sets at the leaves of the tree are copies of A, the sets 
at a level i are copies of 2^ A and the set at the root is lA. A set of any vertex is 
the sum of the sets at its two children. 



The algorithm in the current case is more complicated. Before describing it, let us 
assume, without loss of generality, that Ms a power of 4 {I = 4*) and \Ai\ = m 
and E Ai for all 1 < z < L Set Ai = A\ for i = 1, . . . ,1 and h = I. Here is the 
description of the algorithm. 

The algorithm. At the t*^ step, the input is a sequence A*, . . . of the same 
cardinality rit where It is an even number. Choose a pair I < i < j < h which 
maximize | A* + A* \ (if there are many such pairs choose an arbitrary one). Denote 
the sum Aj + A* by A[ . Remove i and j from the index set and repeat the operation 
to obtain A2 and so on. After lt/2 operations we obtain a sequence A'l, . . . ,Al^^i^ 
of sets with decreasing cardinalities. Define /t+i = /(/4. Consider the sequence 
A!-^,. . . ,A!, and truncate all but the last set so that all of them have the same 

cardinality (which is |^J^^J). The truncated sets will be named A*"*"-^, . . . , 

and they form the input of the next step. It is clear that It = -^hr for all plausible 
t's. The algorithm halts at time s + 1 where Ig+i = 1. 

Notice that Aj'^_^^ is a subset of [2*n], so nt+i < 2*n. We first show that there is 
some t < s so that rit+i < 4'^+^nt. Assume otherwise. Then 

rie+i > = {4'f4'\A\ = 4n'^\A\ > 

a contradiction. In the following, let t be the first index so that rit+i < i'^^^rit- By 
the description of the algorithm, there are lt/2 sets among the sets Aj's such that 
every pair of them have cardinality at most rit+i < 4*^+^724. Let us call these sets 
Bi, . . .,B^/2. We have 

• = • •• = = nt, 
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• Bi's are subsets of the interval [2* ^n], 

• \Bi + Bj\< A'^+'^TH, for all 1 < i < j < lt/2. 

By Lemma 5.8, the sum Bi + • • • + Bi^/2 contains a translation of I'A', where 
I' > eh/ 2 and A' is a proper GAP with cardinality at least erit and e is a positive 
constant depending on d. Moreover, A' is a subset of [ki2*~^n], for some constant 
ki depending on d. Set n' = fci2'~^n. To conclude the proof, let us verify that /', n' 
and A' satisfy the required relations. First of all 



id 



Since l^\A\ > Cn, it follows that 



(20) 



^ ^ ' ' - 2-^ ki2d' 

By increasing C (notice that e and ki do not depend on C), we can assume that 
{l')''-\A'\/n' is sufRciently large. This guarantees that the condition of Theorem 3.12 
is met. Replacing d by d' in (20) one can verify that for any d' < d 



{i'f\A'\ = n{f'\A\), 

concluding the proof. □ 

6. FOLKMAN'S conjecture ON SUBCOMPLETE SEQUENCES 

For a (finite or infinite) set A, Sa denotes the collection of subset sums of A 



Sa = {Y^x\B c A,\B\ <oo}. 



An infinite sequence A of positive integers is suhcomplete if Sa contains an infinite 
arithmetic progression. Subcomplete sequences have been studied extensively and 
we refer the reader to Section 6 of the monograph [9] by Erdos and Graham for a 
survey. For an infinite sequence A, we use A{n) to denote the number of elements 
of A between 1 and n. This number could be larger than n as A might contain the 
same number many times. In 1966, Folkman made the following conjecture 

Conjecture 6.1. There is a constant C such that the following holds. If A = 
{ai < a2 < < . . .} is an infinite non- decreasing sequence of positive integers 
and A{n) > Cn, for all sufficiently large n, then A is subcomplete. 
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Folkman's conjecture was considered by Erdos and Graham as the most important 
conjecture concerning subcomplete sequences ([9], Section 6). Folkman himself 
proved that the conjecture holds under a stronger condition that A{n) > n^"*""^, 
where e is an arbitrary positive constant. The conjecture is sharp, as one cannot 
replace n by n^~^. To show this, let us present an observation of Erdos [8]. 

Fact 6.2. Consider an infinite sequence A = {ai, 02, . . . }. // 



limsup(ai — Uj) —^ 00, (21) 



then A is not subcomplete. 



To verify Fact 6.2, notice that if A is subcomplete and d is the difference of an infinite 
arithmetic progression contained in Sa, then d is lower bounded by lim supj^;,^ (aj — 

For any fixed e > 0, it is simple to find a non-decreasing sequence A such that 
A{n) = 0(ni-') and A satisfies (21). 

Using a special case of Theorem 5.1 (Corollary 5.2), we are able to confirm Folk- 
man's conjecture. 

Theorem 6.3. There is a constant C such that the following holds. If A = {ai < 
0.2 < as < • • • } is an infinite non- decreasing sequence of positive integers and 
A{n) > Cn, for all sufficiently large n, then A is subcomplete. 

The rest of this section is devoted to the proof of Theorem 6.3, which relies on 
Corollary 5.2. First, we prove a sufficient condition for subcompleteness. This 
condition is of independent interest and will be used for another problem in Section 
9. To complete the proof, we show that any sufficiently dense sequence satisfies 
this condition. This part of the proof makes a significant use of Corollary 5.2. 

6.4. A sufficient condition for subcompleteness. Wc say that a sequence A 
admits a good partition if it can be partitioned into two subsequences A' and A 
with the following two properties 

• There is a number d such that Sa' contains an arbitrary long arithmetic 
progression with difference d. 

• Let A = bi < b2 < b^ < . . . . For any number K, there is an index i{K) 
such that bj >bi + K for all i>i{K). 

Admitting a good partition is a sufficient condition for subcompleteness. 
Lemma 6.5. Any sequence A which admits a good partition is subcomplete. 
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Proof of Lemma 6.5. We start with a definition. 

Definition 6.6. An infinite sequence -B = {61 < 62 < 63 < . . . } is a {d, L)-net if 
— bi < L and is divisible by d for all i = 1,2 

Observe that if B is a (d, i)-net and Q is a finite arithmetic progression with 
difference d and length larger than L/d, then B + Q contains an infinite arithmetic 
progression with difference d. This observation is the leading idea in what follows. 

Assume that A admits a good partition and let Qo, Qi, Q2, ... be arithmetic pro- 
gressions with the same difference d and strictly increasing lengths contained in Sa' ■ 
The existence of the QiS is guaranteed by the first property of a good partition. 

Next, we focus on A . Let X be the set of divisors d' of d with the following 
property: All but at most finite elements of A are divisible by d' . Since 1 e X, 
X is not empty and thus has a maximum clement di. By throwing away finite 
elements, we can assume that all elements of A are divisible by di. Next, discard 
all elements y (in the remaining sequence) if there is only a finite number elements 
of A which equal y modulo d. Again, we discard only a finite number of elements 
so the remaining sequence still has the same density as A . Thus, we can assume 
that A = {b\di < b2dx < ...} where the fei's have the following property: Let 
b\ be the remainder when dividing bi by d. For each z, there are infinitely many 
j's such that b\ = bj. Moreover, the greatest common divisor of the b'^s equals 
one modulo d by the definition of di. We next need the following elementary fact, 
which is a consequence of the Chinese remainder theorem. 

Fact 6.7. Let 1 < zi < Z2 < • • • < < d be positive integers. //gcd(zi, . . . ,Zh) = 
l{modd), then there are integers < ai,...,ah < d such that J2'j=i^j^j = 
l{mod d). 

By Fact 6.7 and the property of A" described in the previous pararaph, we can 
find (d — 1) mutually disjoint finite subsets Xi,. . . ,Xd-i of A so that the sum 
of the elements in each subset equals di modulo d. Denote these sums by xid + 
di,... ,Xd,-\d + di, where XiS are non-negative integers. For any arithmetic pro- 
gression Qj with length I > 3{xi + h x^-i), the set + S^^^d+di,...,xa_id+di} 

contains an arithmetic progression with difference di and length at least 1/2 (recall 
that Qj has difference d which is divisible by di). Since the lengths of the Qj's 
go to infinity with j, we can conclude that Sa' + S^xid+di,...,x4-id+di} contains an 
arbitrarily long arithmetic progression with difference di . 

Set a'" = A"\ Llf~l Xf, to complete the proof of the lemma, it suffices to prove 
that Sj^'" contains a {di, L)-net for some constant L. Let Sj^'" = {si < ,S2 < . . . }. 
Every elements of A is divisible by di and so are all Si's. Therefore, it suffices 
to exhibit the existence of a constant L satisfying Sj+i — Si < L for all i. The 
existence of L follows directly from the following observation, due to Graham [15], 
and the second property of a good partition (this is the only place where we use 
this property). 
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Fact 6.8. Let Y = yi < 1/2 < ■ ■ ■ be an infinite sequence of positive integers and 
Sy = {si < S2 < ■ ■ ■}■ Ifym+1 < SI^Li Vi foi" all sufficiently large m, then there is 
some L such that Sj+i — Si < L for all i. 

Fact 6.8 is not too hard and the reader might want to consider it as an exercise. 

6.9. Proof of Theorem 6.3. We first present a lemma which provides a Unk 
between good partitions and subcompleteness. This lemma is a simple, but a bit 
tricky, consequence of Corollary 5.2. 

Lemma 6.10. There is a constant C such that the following holds. If A is a 
multi-set of positive integers between 1 and n and \ A\ > Cn, then Sa contains an 
arithmetic progression of length n. 

Proof of Lemma 6.10. We show that the same constant C in Corollary 5.2 
suffices. Without loss of generality, we assume that C is an integer and |^| = Cn. 
If the multi-set A contains an clement a of multiplicity n, then the arithmetic 
progression a, 2a, ... , na is a subset of Sa and we are done. In the other case, we 
can partition the Cn elements of A into n sets Xi,. . . , X„ such that each Xj consists 
of exactly C different elements. The sum Xi + - ■ ■ + Xn is a subset of Sa- Corollary 

5.2 implies that the sum Xi-\ \-Xn contains an arithmetic progression of length 

n, given that C is sufficiently large. This concludes the proof of the lemma. □ 

With Lemma 6.10 in hand, we arc in a position to prove that the sequence A 
in Theorem 6.3 admits a good partition, provided that the constant C in this 
theorem is sufficiently large. The partition is the most natural one. Assume that 
the elements of A arc ordered non-decrcasingly A ~ ai < 02 < 03 < . . . ; A' consists 
of the elements with odd indices, A consists of those with even indices. 

By definition, A' = {02,04,06,...}. Since A{n) > Cn for all sufficiently large 
n (recall that A{n) is the number of elements of A between 1 and n), for every 
sufficiently large even number j, 

dj < j/C < i/5 < 02 + 04 H h aj-2 - i/4, 

which guarantees the property required for A' . 

It remains to check the property concerning A'. As A has density Cm, A' has 
density Cm/2 so we can assume that A' = {bi < b2 < . . .}, where bm < 2m/C for 
all sufficiently large m. Let A'[m] be the set consisting of the first m elements of 
A' . Fix a sufficiently large m and define Aq = A'[m] and Ai = A' [2^m]\A' [2^~^m]. 
The set Ai has 2*~^to elements and is a subset of the interval [2^+^to/C]. 

To conclude the proof, we make use of the following lemma, proved in [27] 
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Lemma 6.11. Let P be a generalized arithmetic progression of rank 2, P — {xiai + 
a;2a2|0 < a;^ < h}, where k > 5a3_i for i = 1,2. Then P contains an arithmetic 
progression of length h + I2 whose difference is gcd(ai,a2). 

By Lemma 6.10 (provided that C is sufficiently large), Sa, contains an arithmetic 
progression Pi of length li — 2*+^m/C for all i. Set Qo = Pq (and assume that do is 
the difference of Qo) and consider the generaUzed arithmetic progression Qo + -Pi- 
This is a generalized arithmetic progression of rank 2 with volume ^o^i- Moreover, 
this two dimensional generalized arithmetic progression is a subset of an interval 
of small lenght, so one can easily check that its differences are relatively small and 
satisfy the assumption of Lemma 6.11. This lemma implies that Qo + -Pi = -Po + -Pi 
contains an arithmetic progression Qi of length lo + li — 2 with difference di which 
is a divisor of do. (The —2 term comes from the fact that in Lemma 6.11, the edges 
of P have length h + l and I2 + 1, respectively; this term, of course, plays no role.) 
Similarly, by considering Qi + P2 we obtain an arithmetic progression Q2 of length 
lo + h + h — 3 with difference ^2 which is a divisor of di and so on. The sequence 
do,di,d2, ■ ■ ■ is non-increasing, so there is an index j so that di = dj = d for 
all i > j. The arithmetic progressions Qj, Qj+i, Qj+2, ■ ■ ■ have strictly increasing 
lengths and the same difference d. Moreover, each Qi is a subset of Sa' and this 
completes the proof. □ 



7. SUMSETS WITH DISTINCT SUMMANDS 

In this section, wc strengthen Theorem 3.12 in another direction. Instead of the 
sumset lA, we are going to consider the much more restricted sumset I* A, which 
consists of the sums ai H ha; where the a,'s are different elements of A. 

Theorem 7.1. For any fixed positive integer d there are positive constants C and 

c depending on d such that the following holds. For any positive integers n and I 
and any set A C [n] satisfying I < \A\/2 and ^'^j^l > Cn, I* A contains a proper 
GAP of rank d' and volume at least cl''' \A\, for some integer 1 < d' < d. 

The requirement that the summands must be different usually poses a great chal- 
lenge in additive problems. One of the most well-known examples is the celebrated 
Erdos-Heilbronn's conjecture. In order to describe this conjecture, let us start 
with the classical Cauchy-Danveport theorem which asserts that if A is a set of 
residues modulo n, where n is a prime, then \2A\ > min{n, 2|A| — 1}. For A be- 
ing an arithmetic progression, the bound is sharp. Now let us consider 2* A. We 
want to bound \2*A\ from below with something similar to Cauchy-Danveport's 
bound. Observe that in the special case when A is an arithmetic progression, 
\2*A\ = min{n, 2\A\ — 3}. Thus one may guess that 



\2*A\ > min{n, 2|A| - 3}, 



(22) 
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holds for any set A. This is what Erdos and Heilbronn conjectured. While Cauchy- 
Davenport's theorem is quite easy to prove, Erdos -Heilbronn's conjecture had been 
open for about thirty years until it was solved by da Silva and Hamidoune in 1994 
[7]. 

It is now not so big a surprise that Theorem 7.1 is harder and deeper than both 
Theorem 3.12 and Theorem 5.1. The proof of Theorem 7.1 uses Theorem 3.12 as a 
lemma and requires lots of additional arguments, but let us take a gentle start by 
introducing some simple ideas. 

7.2. The initial ideas. The initial ideas in the proof of Theorem 7.1 arc similar 
to those in the proof of Theorem 5.1. We want to show that there are numbers 
I', n' and a set A' such that 

• A' in a. subset of [n'] and n' , \A'\ satisfy the conditions of Theorem 3.12, 
namely l/d\A'\/n' is sufficiently large. 

• {lY'\A'\ = Cl{l'^'\A\) for all 1 < d' < d. 

In the rest of the proof, we call a triple {A', l',n') perfect if it satisfies the above two 
conditions. If we could show that there is a perfect triple {A', I', n') such that I' A' 
is a subset of 1* A, then an application of Theorem 3.12 to this triple immediately 
implies the statement of Theorem 7.1. 

It is useful to notice that in Theorem 7.1, instead of the assumption I < \A\/2, we 
can afford a stronger assumption that I < e\A\ for any positive constant e, at the 
cost of increasing the constant C. One can argue as follows. First one puts aside 
(1 — e)l elements from A. Next, consider the pair (Ai, li) where Ai is the set of the 
remaining \ A\ — (1 — e)l elements and li = el. It is trivial that li < e\Ai\. On the 
other hand, the sum of an element from l^Ai and the sum of the (1 — e)l elements 
put aside is an element of I* A. So if l^Ai contains a proper GAP P, then I* A 
contains a translation of P. 

The above argument also shows that for any li < I, if Ai is a subset of at most 
\A\ — {I — h) elements of A, then llAi is a subset of a translation of I* A. 

In the proof of Theorem 7.1, we shall assume that I < e\A\, whenever needed. We 
shall also assume that / — 1 elements of A are put aside in case we need them to 
create the sum of exactly I elements. These assumptions provide us some flexibility 
in constructing a perfect triple. In particular, we shall not need to show that I' A' 
is a subset of I* A; it suffices to show that I' A' is a subset of a translation of I* A, 
for some I < I. 

The main part of the proof is to construct a perfect triple and this is significantly 
harder than what we did in the proof of Theorem 5.1. However, when |^| is large 
the construction is relatively simple and we start with this case. The treatment 
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of the harder case when \A\ is relatively small starts in subsection 7.5, where we 
present a key structural lemma. The proof of this lemma occupies the rest of this 
section. In the next section, Section 7, we present the rest of the proof of Theorem 
7.1. 



7.3. The case when |^| is large. Let Ai be a subset of A with cardinality / — 1 
and set A2 = A\Ai. Since \A\ >2l, 



A2\ > (23) 



We assume, with foresight (and with room to spare), that > 80Cnlog2n and 
l'^\A\ > 160 X 2<*Cn, where C is the constant in Theorem 3.12. 

Define nii = 2' for all 1 < z < t, where t is the smallest index such that nit > \A2\ll. 
Since 1 < |A| < n, f < log2 n. Let Si be the set of those numbers in [2n] which 
can be represented as the sum of two different elements in Ai in at least and 
less than m^+i ways. It is essential to observe that m^S'^ is a subset of (2mi)*j4. 
On the other hand, a simple double counting argument gives 



E-.|^.|>('^2'')-4n>g = ^. (24) 

Next, we split X^i^i™*!*^*! i^^o three parts. The first part comprises those mij^il 
where mil^il < Obviously, the contribution of this part to the sum is at most 
tj^ = J. The second part consists of those mi\Si\ where 15^1 < Since the 

sequence rui is geometric, the sum of all m^'s is bounded from above by 2|A2|. 
Thus, the contribution of the second part is upper bounded by 2|A2|-^^^ < |. The 
third part contains the remaining mi\Si\'s and, as a consequence of the previous 
estimates, its contribution is at least | . 

Let ii < ^2 < • • • < ij be the indices in the third part. We have 



i2'^iJSiJ>l. (25) 
We are going to consider two cases: 
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(I) 2mi. > I: In this case \Si. \ > > and ^Si- is a subset of I* A. In view of 
the initial ideas presented in the previous subsection, we set A' = Si^ , n' = 2n and 
I' = 1/2. Since l'^\A\ > 160 x 2'^Cn 



and 



i"^'\A\ = n{i'^'\A\), 

for any 1 < d' < d. The last two estimates guarantee that the triple {A',n',l') is 
perfect and we are done. 

(II) 2mi. < I: In this case, wc prove that I* A contains an arithmetic progression of 
length cl\A\ (in other words, one can set the parameter d' in Theorem 7.1 equal to 
one). For any integer a which is the sum of / — 2mij different elements in Ai (the 
set we put aside at the beginning of the proof), a + rrii^Si^ is a subset of I* A. On 
the other hand, as |Ap > 80nlog2n, 



mi^\Si.\>l> Jf^ >Cn 
^ ^ 4t 20 log2 n 



Theorem 3.12 implies that TUi^Si^ contains an arithmetic progression of length 



h\ h\- 4^ - 201og2n - ' ' 

if I < |j4|/201og2 n. The case when / is larger than |j4|/201og2 n requires an extra 
argument. Notice that by the definition of the third partial sum and the assumption 

on \A\ 



Irui \Si \ >-^> 1^1' > Cn. 
2 - 24t - 401og2n " 

Given this, wc can apply Theorem 3.12 to ^"m-igSi^ to obtain an arithmetic progres- 
sion of length cmi^\Si^\, for every index g in the third partial sum. To conclude, 
we use the following simple fact to glue these arithmetic progressions together 

Fact 7.4. Any element in J2g=i \''^ig^ia ca?^ be represented by the sum of m = 
niig different elements from A2. 
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Proof of Fact 7.4. Greedy algorithm. □ 

It follows that J2g^i \''^ig^ig ^ subset of m*A2, with m defined as in Fact 7.4. 
Finally, by applying Corollary 6.11 itcratively one can show that Yj'g=i 5™»s'^»a 
contains an arithmetic progression of length 

cY,miJS,J>cl>cl^>c^. 

9=1 

Now we can add additional elements from Ai to m*A2 to obtain a subset of I* A. 

□ 

This simple proof, unfortunately, cannot be repeated for the case \A\ = o{^/n). 
However, the arguments presented here will be useful later on. 

7.5. A structural lemma. In view of the result in the previous subsection, we 
only have the deal with the case \A\ = 0{^Jn\og n). Actually, this upper bound 
on \A\ matters little, but it imposes a bound on I that is critical. Notice that 
if 1^1 = 0{\/n logn), then in order to guarantee the assumption V^\A\ > Cn of 
Theorem 7.1, we must have 

/ = n(ni/^'^-°W)»log^°n. 

In this subsection, we focus on those pairs A), where l'^\A\ is close to n (but not 
necessarily larger than n) and I is relatively large. A key step in our proof is the 
following structural lemma, which asserts that if I* A does not yield a proper GAP 
as claimed by Theorem 7.1, then A must contain a big subset which has a very 
rigid structure. 

Lemma 7.6. For any positive constants v and d there are positive constants 5, a 
and di such that the following holds. Let A be a subset of [n], I be a positive integer 
and n> f{n) > 1 be a function of n such that 

maxjlog^" n, (40/(n) log2 nf'^'^} < I < \A\/2 

and l'^\A\f{n) > n. Then one of the following two statements must hold 

• I* A contains a proper GAP of rank d' and volume f2(Z'' \A\) for some 1 < 
d' <d. 

• There is a subset A of A with cardinality at least 6\A\ which is contained 
in a GAP P of rank di and volume 0(m/(n)^+'' log" n). 

The function f{n) can be seen as a rigidity parameter. The closer l'^\A\ is to n, the 
more rigid is the structure of A. With some extra work, the lower bound of I in 
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the lemma can be improved: 10 can be replaced by any constant larger than 1 and 
l/3d can be replaced by any positive constant. If we refine the result this way, the 
constants a, v and d\ will also depend on the new constants. 

For the proof of Theorem 7.1, we only need the special case when /(n) = 1. We, 
however, choose to present Lemma 7.6 in the above general form since it might be 
of independent interest and the proof is not significantly harder than that of the 
special case. 

With /(n) = 1, Lemma 7.6 yields the following corollary. 

Corollary 7.7. For any positive constant d there are positive constants 6, a and di 

such that the following holds. Let A be a subset of [n], I be a positive integer such 
that l'^\A\ > Cn. Then one of the following two statements must hold 

• I* A contains a proper GAP of rank d' and volume Sl{l'^ \A\) for some 1 < 
d' <d. 

• There is a subset A of A with cardinality at least 5\A\ which is contained 
in a GAP P of rank d\ and volume 0{\A\ log" n). 

Notice that the set A in Corollary 7.7 satisfies 



f\A\ > f5\A\ > 8Cn. 

Since 5 depends only on d, by increasing the constant C in Theorem 7.1, wc can 
always assume that 5C is sufficiently large. Thus, given Corollary 7.7, it suffices to 
prove Theorem 7.1 under the additional condition that A is a subset of density at 

least -i — \r- of a GAP of constant rank, where both the rank and a arc constants 

log n ' 

depending on d. We present this proof in the next section. A reader who is eager 
to see this proof can delay the reading of the rest of this section and jump right to 
Section 7. 

The rest of this section is devoted to the proof of Lemma 7.6. As this proof is fairly 
long, we brake it into four parts, each of which contains arguments of fairly different 
nature. The main technical ingredient of this proof is again a tree argument, similar 
to what we used in the proof of Theorem 5.1. However, the algorithm here is 
more complicated than the algorithm in Section 4, and the analysis is also more 
challenging. 

In order to set up the algorithm we first need to produce a large amount of subsets 
of A with a certain property. This will be done in the next subsection. In sub- 
section 7.10, wc describe our algorithm together with several simple observations. 
Subsection 7.12 is devoted to an inverse argument, which we use to derive the de- 
sired properties of A. This derivation is quite different from and much more tricky 
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than the one in Section 5. We wrap up with the final subsection, subsection 7.14, 
which contains the verification of an estimate claimed in subsection 7.12. 

7.8. Small sets with big sums. The goal of this subsection is to show that any 
finite set A contains a subset B of small size (0(ln|A|)) such that \rB\ is large, 
where I = \B\/2. 

Lemma 7.9. Let A be a finite set of real numbers where \A\ is sufficiently large. 
Then A contains a subset B of at most 201og2 \ A\ elements such that {^)*B has 
cardinality at least \A\. 

Proof of Lemma 7.9. We can assume, without loss of generality, that \A\ is 
sufficiently large so that |A| > lOOlogj |^|. We choose the first two elements of A, 
say ai,a2 arbitrarily. Once ai,. . . ,a2i have been chosen, we next choose a2i+i and 
a2i+2 from A\{ai, . . . ,a2i} such that 



(if there are many possible pairs, we choose an arbitrary one). We stop at time 
T when \T*{ai, . . . , a2T}\ > \A\ and let B = {ai, . . . , a2T}- It is clear that \B\ < 
21ogi 1 \A\ < 201og2 1^1 . The only point we need to make now is to show that 
as far as |i*{ai, . . . ,a2i}\ < \A\, we can always find a pair (021+1,021+2) to satisfy 
(26). Assume (for contradiction) that we get stuck at the i*'' step and denote 
by S the sum set i*{ai, . . . , a2i}- For any two numbers a, a' € A\{ai, . . . , a2i}, 
{a + S)[J {a' + S) is a subset of {i + l)*{ai, . . . , a2i, a, a'}. So by the assumption 
we have 



Since both a + S and a' + S have |5| elements, it follows that their intersection 
has at least .9\S\ elements. This implies that the equation a' — a = x — y has at 
least .9|S'| solutions {x,y) where x S and y G S. Now let us fix a as the smallest 
element of A\{ai, . . . , a2i} and choose a' arbitrarily. There are \ A\ — 2i — 1 > .9| A| 
choices for a', each of which generates at least .9151 pairs {x,y) where both x and 
y are elements of S. As all {x, y) pairs are different, we have that 



\{i + l)*{ai, . . . ,a2i+i,a2i+2}| > l.l|i*{ai, . . . ,02j| 



(26) 



\{a + S)i^{a' + S)\ < 1.1\S\. 




which implies that |5| > \ A\, a contradiction. This concludes the proof. 



□ 



Many good small sets. Consider a set A as in Theorem 7.1. Applying Lemma 7.9 to 
A to obtain a small set Ai. Next, apply the lemma to A\Ai to obtain a small set A2 
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and so on. Each time we add to Ai few "dummy" elements to make its cardinality 
exactly 201og2 \A\. Stop when A\{U^l^lAi) has less than 2m/3 elements for the 
first time. Without loss of generality, we can assume that 201og2 \ A\ is even and 
set lo = 101og2 \ A\. We have a collection Ai,. . . ,Am of disjoint subsets of A with 
the following properties 



. \Ai\ = --- = \A„ 
. \A\{UZiAi)\ 



= 201og2 1^1 = 2lo. 
>{2/3-o{l))\A\>\A\/2. 
(2/3 + o(l))|A|. 



Here we assume that log2 |^| = o(|A|) which explains the error terms o(l) in the 
last two properties. In the next subsection, we consider an algorithm which uses 
the sets Ai as input. 



7.10. The algorithm. Set Bi = IqAi for all 1 < z < m. We now give a description 
of our algorithm. This algorithm constructs a subset of I* A in a particular way. 
We shall exploit the fact that the cardinality of this subset is at most \l*A\ < In 
(since I* A itself is a subset of [In]) in order to derive information about A. 

The algorithm. To start, set mo = m. Truncate the set i?i's so each of them has 
exactly bo = \A\/2 elements. Denote by 5° the truncation of Bi. We start with 
the sequence of sets B^,. . . , B^^, each of which has exactly bo elements. Without 
loss of generality, we may assume that mo is a power of 4. At the beginning, we 
call the elements in A^^^ = A\(U™ ^Aj) available. 

A general step of the algorithm functions as follows. The input is a sequence 
Bi,..., B*^^ of sets of the same cardinality bt . Consider the sets + -Bj + a^/j) 

where 1 < i < j < m* and xi , . . . , xk are different available elements {K is a large 
constant to be specified later). Choose i, j,xi, . . . ,xk such that the cardinality of 
B[ = U^^j(i?* + Bj + Xh) is maximum (if there are many possibilities, choose an 
arbitrary one). Remove i and j from the index set and the a;,'s from the available 
set and repeat the operation to obtain B'^ and so on. We end up with a set sequence 
^i'---'^:„./2 where > • • • > |i?;,^/2l- 

Let mt+i = mt/4 and set bt+i = I-Stoj+J- Truncate B-'s {i < mt+i) so that the 
remaining sets have exactly bt+i elements each. Denote by Bl~^^ the remaining 
subset of B^. The sequence Bl'^^, . . . , -B^^^ is the output of the step. 

If mt+i > 4, then we continue with the next step. Otherwise, the algorithm termi- 
nates. 



Let us pause for a moment and make a series of observations. All of these observa- 
tions are easy to veriiy so we omit their proofs. 
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• Define k+i ~ 2lt + 1 for t = 0, 1, 2 . . . . Then _B| is a subset of l^A for any 
plausible t and i. 

• As ^ is a subset of [n], Bj is a subset of [Itn]. 

• For any plausible t, bt+i > 2bt. 

• After each step, the length of the sequence shrinks by a factor 4. 

• At the beginning we have (2/3 — o(l)) available elements. The number of 
elements Xj's used in the whole algorithm is o(|A|), so at any step, there 
are always (2/3 — o(1))|j4| available elements. 

Since I Iq = 0(log2 n), we can assume, without loss of generality, that I/Iq is a 
power of two, l/lo = 2*^. Recall that mo = m « (m is slightly larger than 
and \A\ > 21. It follows that l/lo < 4mo. As we assume mo is a power of 4, 
mo = 4*1, it follows that 2(si + 1) > S2. 

We set K = 2'^^'^, where ci is a constant at least 9. We first claim that 



{K/2y^/'^ > m'^f{n) logn. (27) 

Indeed, observe that 



{K/2y^>'^ > (2^''+72)^=/2 = 2^'''*2/2. (28) 

Recalling the definition of S2, 2*^ = I/Iq. We assume that I > log2°n > Iq, so 
2S2 > /8/9 follows that 

29ds2/2 > lid ^id^ jSd > 4Qi'^f{n) logn, 
by the assumption on I. 
We next prove the following fact. 

Fact 7.11. There is an index k < S2/2 such that bk < K'^bo. 

Proof of Fact 7.11. By the second observation we have that bk < lun for any k. 
Prom the definition of It it is easy to prove (using induction) that 

Ik < 2''lo + 2'= < 2*=+^Zo- 

It follows that bk < 2''+^lon for any k. Recah that 60 = \A\/2 and Iq = lOlogj \A\. 
If bk > K'^bo, then we should have 
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K^\A\/2 = K% <bk< < 2*^+^n x (lOloga \A\), 

which imphes 

{Kl2f\A\ < 40nlog2 \A\ < 40nlog2n. 

On the other hand, (27) and the assumption l'^\A\f{n) > n of Lemma 7.6 together 
imply 

{K/2y^/^\A\ > 40/'^|y4|/(n) log2 n > 40nlog2n, 
which is a contradiction. The proof is thus complete. □ 

7.12. The inverse argument. Let k be the first index where bk < K'^bo- This 

means \B^J < K'^bo- By the description of the algorithm 



S^,=uf=i(i?f-^+i?^-^+a;,) (29) 

for some i,j and Xh's. Given (29), we are going to exploit the bound \B^^ \ < K^bo 
in many ways. First, this bound and the definition of k means that \B^~^+B^~^ \ is 
relatively small and so we can use Freiman's theorem to derive some facts about the 
sets B^~'^ and -B*^"^ Next, (29) and the bound on |fi^J imply that there should 
be a significant overlap among the sets {B^~^ + Bj~^ + XhYs. Thus, there should 
be a correlation between the (available) elements x/i's. This correlation eventually 
leads us to a structural property of the set of available elements. The set A claimed 
in the lemma will be a subset of this set. 

To start, notice that (29) implies 



\Btj>\Bf-'+B^-'\ (30) 

where 1 < i < j < rn,i~-i and both and Bj^^ has cardinality bk-i > K^~^bo- 

The definition of k then implies that \B^J < Kbk-i, so 

\Bt^ + B^-^\ < K\Bt\ (31) 

Applying Freiman's theorem to (31), we could deduce that there is a generalized 
AP R with constant rank containing Sf"^ and Vol(ii) = 0{\B^~^\) = 0{bk-i). 
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We say that two elements u and v of B^^^ are equivalent if their difference is in 
R — R. If M and v are not equivalent then the sets u + B^~^ and v + B'^~^ are 
disjoint, since B^~'^ is a subset of R. By (31), the number of equivalent classes is at 
most K. Let us denote these classes by Ci, . . . , Ck, where some of the C^'s might 
be empty. We have Sf"^ c R and S)"^ C ufLiC^- 

Let us now take a close look at (29). The assumption l-B^^I < K\B^~^\ and (29) 
imply that there must be a pair Si, S2 such that the intersection 

is not empty. Moreover, the set {xx, . . . ,Xk} in (29) was chosen optimally. Thus, 
for any set of K available elements, there are two elements x and y such that 

(Bf-i + B^-^ +x)r\ (Sf-i + B^-^ + y) 

is not empty. This implies 



x-y& (Bf-i + 4-I) - (Sf-i + S^-i) c yJi<g,h<K{{R + Cg) - (i? + Ch)]. 

^ (32) 

Define a graph G on the set of available elements as follows: x and y are adjacent 
if and only if x - y e (Sf"^ + Sj^"^) - (Sf"^ + Sj^"^). By the argument above, G 
does not contain an independent set of size K, so there should be a vertex x with 
degree at least \V{G)\/K. By (32), there is a pair (<;, /i) such that there are at least 
\V{G)\/K'^ elements y satisfying 



x-y^{R + Cg)-{R + Ch). (33) 

Both Gg and Gh are subsets of translations of i?; so the set A of the elements y 
satisfying (33) is a subset of a translation of P = (R + R) — {R + R). Recall that 
at any step, the number of available elements is (1 — o(l))|A2|, we have 



\A\>{l-o{l))\A,\/K^ = n{\A2\). (34) 

Let us summarize what we have obtained here. We have found a subset A of A of 
density at least (2/3 — o{l))/K^ = and a GAP P which contains A. In order 
to complete the proof of the lemma, it remains to bound the volume of P. We need 
to show that if the first statement of the lemma does not hold, then 



Vol(P) = 0{\A\f{n)^+'' log" n). (35) 



At this point, we know that 
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Vol(P) = 0(Vol(i?)) = 0{bk), (36) 

where < K'^bo = K'^IA]. Unfortunately, we still do not know much about K''. 
Our next task is to prove that if the first statement of the lemma does not hold, 
then 



i^'= = 0(/(n)i+^log"n), (37) 

which imphes (35). 

In order to verify (37), we need to exploit the definition of the sets even more. 
Notice that when we define B^^^^ , wc choose i and j optimally. On the other hand, 
as nik = jiTik-i, for any remaining index i, we have at least m' = mfe_i/2 choices 
for j. This means that there are m' sets Bj~^, . . . , Bj~^, all of the same cardinality 
bk-i, such that 

\Bt^ +B^-^\<Kbk-i (38) 

for all 1 < r < m'. 

From now on, wc work with the sets B^^^ , 1 < r < m! . By considering equivalent 
classes (as in the paragraph following (31)), we can show that for each r, 
contains a subset Dj- which is a subset of a translation of R and |i)r| > \B'j~^\/K = 
r2(Vol(i?)). 

By Lemma 5.5, there is a constant g such that £>! + • • • + Dg contains a GAP Qi 
with cardinality at least 7Vol(i?) for some positive constant 7. Using the next g 
DiS, we can create Q2 and so on. At the end, wc have m = \ m! / g\ generalized 
AP Qi, . . . , <5^" . Each of these has rank di = rank{R) (this parameter d\ is 
irrelevant in the whole argument) and cardinality at least 7Vol(ii). Moreover, they 
are subsets of translations of the GAP R' = gR which also has volume 0(Vol{R)). 

Consider a GAP Qi. Due to its large volume (compared to the volume of R'), there 
are only 0(1) possibilities for its difference set. Thus, there is a positive constant 71 
such that at least a 71 fraction of the Q^'s has the same difference set. Truncating 
if necessary, we can assume the corresponding sides of these Qj's have the same 
length (the truncation could decrease the volumes by at most a constant factor). 
Since two GAP with the same difference sets and corresponding sides having the 
same length are translations of each other, we conclude that there is a GAP Q (of 
rank di and cardinality at least 'yYol{R)) and an integer m = 0(m ) so that there 
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are least m ' translations of Q among the Qi 's. Without loss of generality, we can 
assume that these translations are Qi , . . . , Q^^'" ■ Before continuing, let us gather 
some facts about Qi and m . 

• IQI = \Qi\ = 0(Vol(J?)) = Q.{bk) = iliK%) > PK%, for some positive 

constant /?. 

• m = f2(m ) = r2(m ) = ^{nik) = n(mo/4'') > jimo/A'', for some positive 
constant /i. 

To proceed further, we need the following fact, whose proof is delayed until the 

next subsection. 

Fact 7.13. // ^2^^) - log''^^ ^^^''^ there isl <l such that ij)* A contains 
a proper GAP of rank d' and volume \A\) for some I < d' < d. 

In order to have I* A instead of {l)*A one can do the usual "reserving" trick. Prior 
to Fact 7.4), put aside I elements from A for reserve. Repeat the whole proof with 
the remaining set until Fact 7.13. Now, choose / — I arbitrary elements from the 
reserved set and add their sum to the set {lyA obtained in Fact 7.13. The resulting 
set is a subset of I* A and it contains a proper GAP as claimed in Theorem 7.1. 

Now we conclude the proof of the lemma via Fact 7.13. If we assume that the first 
statement of the lemma does not hold, then by this fact we have that 

(2#f)'<-'<""°8''""- 

Recall that we set K = 2'^^'^ where ci is a constant. By setting ci sufficiently large 
compared to l/u, it follows that 

K" <f{ny+-\og^n, 
for some constant a = a{u, d), proving (37). □ 

7.14. Proof of Fact 7.13. To prove Fact 7.13, let us set l' = emm(-l-,mk/2), 

''k 

where e is a sufficiently small positive constant. Without loss of generality, we can 
assume that I' is an integer. The definition of I' and the construction of the Qi's 
imply that for a proper choice of e, I'Q is a translation of a subset of {lyA for some 
I < I. Fact 7.13 follows from Theorem 3.12 and the following 

Fact 7.15. // ( 2x4^ ^ — /(") log2^^ '^j ^^^^ the following two inequalities hold 



{lY\Q\ » hn 



(39) 
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{l'f\Q\>'''^'\^l^<d'<d (40) 

where {l'y\Q\ ^ IhU means that ^' jj^^ tends to infinity with n. 

We need to define I' as above due to the following reason. The tree might be too tall 
(having much more than log2(///o) levels) or too short (having less than log2(Z/Zo) 
levels). In the first case we have to look at some immediate level between the root 
and the leaves. This corresponds to the case = e{l/lk)- In the second case, we 
look at some level very close to the root and this corresponds to the definition 
V = e{mk/2). 

Proof of Fact 7.15. Consider an arbitrary integer d! between 1 and d. The 
definition of V naturally leads to the following two cases: 

Case 1. l/lk < mk/2. In this case I' = e{l/lk)- Recalling that there is a constant 
(j such that \Q\ > PK^bo (see the paragraph preceding Fact 7.13), we have that for 
any d' > 1 



{l'f\Q\ > e^'i^-f X (3K% = '^f\A\^, (41) 

where in the last equation we use the fact that 6o = On the other hand, 

recall that lo = 101og2 \ A\, we have 



Ik < "ir^ h = 20 X 2*= log2 \A\ < 20 X 2*= log2 n 
So, it follows from (41) that for any 1 <d' <d 



nY\0\ > '"'^ l''\A\ > -J^f'\A\(^Y^ 

'^'-2x20<^'^ ''^'2^<''logrn-2x20<^^ ^^^W) log^n' (43) 

where the second inequality follows from the assumption that d' < d. The assump- 
tion on K in Fact 7.15 implies that 



(g)'>/(n)logrn>(^||5)-^log^n, 
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SO the right most formula in (42) is larger than l"^ \ A\, for any 1 < d' < d. This 
proves the second inequality in Fact 7.15. To verify the first inequality, notice that 
(42) implies 



h -2x20'^' ^\2'iJ l^log'^n- 2x20^2^) l^log'^n 
Since Ik <2Q x 2^ logj n, it follows that 



h - 40 X 20n 2-^+1 log'^+in' ^ ' 

The assumption on K implies that ^5^) — /('^) log"^"*"^ n, so the right most 
formula in (42) is at least 



40 X 20d log'i+i n 

due to the assumption l'^\A\f{n) > n of Lemma 7.6. This verifies the first inequality 
and completes the treatment of Case 1. 

Case 2. l/lk > mk/2. In this case I' = e(mfe/2). Since m/j = mo/4*^ and 

mo > 1^1/6/0 = |A|/601og2n 

we have that 



l'> 



2x4*^ 120 X 4'=log2n' 
So for any 1 < d' < d 



\J '^1 - V4'= X 1201ogn/ ^ 2 2 x 120'''' ' A^' ' 



4'=xl201ogn/ 2 2 x 120''" ' A'^" \og^' 



n 



2 X 120'^' ' 'M'^^ log: 
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Similar to the previous case, the assumption on K guarantees that (^)'^ > log2^^ n » 



log2 n which implies that 



e'^' ft , K 1 

4x m'*' ' 'Md^ log^n ' ' 



for any 1 < d' < d, which proves the second inequality in Fact 7.15. To verify the 
first inequality, notice that 



^iim > 1 (45) 

h -2x120'^^ '^'^4<^^,log^n- ^^^^ 

Similar to the pervious case, we use the estimate ^fe < 20 x 2^ logj n. This and (45) 
give 



Ik - 40 x 120'^ ' '^2 X 4'*Mog'^+^n' 

Here we need the full strength of the assumption on K: i^^^)'^ > fog2^^ n. From 
this and the assumption that l'^\A\f{n) > n, it follows that 

- ^^^^ ^07120^"^°^^" » 

completing the proof. □ 

8. Proof of Theorem 7.1 (continued) 

Thanks to Corollary 7.7, from now on we can assume that A is a subset of a GAP P 
of rank di and volume at most \A\ log" n, where where both di and a are constants 
depending on d. We first use this structural property to create a set B whose 
elements have high miiltiplicity with respect to A. The set i? is a candidate for the 
set A' in a perfect triplet that we desire. After having created B, the remaining 
(and also the hard) part of the proof is to show that there is a sufficiently large 
I' < 1/2 such that each elements of I'B can be represented as a sum of 21' distinct 
elements of A. This part requires a non-trivial extension of the tiling argument 
used in our earlier paper [28]. In order to carry out this extension we need to prove 
some new properties of proper GAPs. 



This section is organized as follows. In subsection 8.1, we define the set B and 
derive several properties of this set. This subsection also contains a proof of the 
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theorem for the case when / is relatively small compared to (see Corollary 8.2). 
The next subsection, subsection 8.3, is devoted to the study of proper GAPs. The 
results of this subsection will be used in subsection 8.6 to prove further properties 
of the set B. In subsection 8.7, we specify a plan for constructing a sumset I'B as 
desired. This plan is executed in the next three subsections, 8.8, 8.10 and 8.11. The 
final subsection, subsection 8.12, discusses a common generalization of Theorem 5.1 
and Theorem 7.1. 



8.1. Sets with high multiplicity. We are going to show that there is a large 
set every element of which has high multiplicity with respect to A. Consider a 
monotone sequence mi , m2 , . . . and let Si be the set of numbers with multiplicities 
between rrii and mj+i . A natural way to find a large set with high multiplicity is to 
set ro, = 2' and process as in subsection 7.3. Here, however, we shall set the mj's 
somewhat differently, in order to serve a purpose which will become clear later. 

We define rrii = -gq for all i = 1,2,..., logj \A\ (observe that the sequence rrii is 
decreasing). Let St be the set of those numbers whose multiplicities with respect 
to A is less than m, and at least mj+i. A simple double counting shows 



log2 1^1 



1^1 



^ mi\Si\> ). (46) 



Now we are going to make some use of the structure of A. Since A is a subset of 

a GAP P, 2A is a subset of 2P. On the other hand, as P is a GAP of constant 
rank and volume 0(| A| log2 n), so 2P is a GAP with the same rank and volume 
0{\A\ log2 n). The set Si (for all i) is a subset of 2A, so it follows that 



l^il < \2A\ < \2P\ = 0(Vol(2P)) = 0{\A\ log^ n). (47) 



By (47), the sum of those mi|S'i| where rrij < - — is at most 



o(r4jT^I^|log2n) xlog|^|=0((-4j^|A|logr^n) =o{\Af). 
viog2^ n ' log2^ n ' ^48) 



This estimate allows us to omit these terms from the sum in (46) and so significantly 
reduce the number of terms in the sum. Notice that for any i > log2 log2 ^ n. 



TOj < \ — 3T2— ) so we only have to look at the small i's, i < log2 logj^^ n. Prom (46) 

10g2 n 

and (48), we have 
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log2log°+^n log2log°+^n , , 

E §l^^l= E m\S.\>(^^^)-o{\Af) = C--o{l))\A\'. 

i=l i=l ^ ^ (49) 



The fact that X^i^i = 7i'^/6 and (49) imply that there should be an index 1 < 

i < log log"^'^ n so that 



6 2* 1 2' 



Choose the smallest i satisfying the above inequality and rename the corresponding 
set Si to B. We are going to work with B in the rest of the proof. We set 
h = (i+i^i+T- Since we shall use the letter i as an index later, let us set t = 2*+^ 
to avoid confusion. Under this new notation, li = fl^-f, where t = 2*+^ is at 

most 2^°S2i°g?'^'l^l < log^+^n. By the definition of the S^'s, every element of B 
has multiplicity at least h with respect to A. This implies that kB is a subset of 
{2k)* A for any k < h. Now let us consider two cases: 

Case 1: I < 2l-i. In this case, we set A' = B, I' = 1/2 and n' = 2n and follow 
the plan described in subsection 7.2. It is easy to verify that the triplet {A' , I', n') 
is perfect. Thus we have the following corollary which proves Theorem 7.1 for the 
case I is relatively small compared to \A\. 

Corollary 8.2. For any fixed positive integer d there are positive constants C, c 

and (3 depending on d such that the following holds. For any positive integers n and 
I and any set A C [n] satisfying I < ^^^J ^ and l'''\A\ > Cn, 1* A contains a proper 
GAP of rank d' and volume at least cl'^ \A\, for some 1 < d' < d. 



In the remaining part of the paper, we consider the case / > 2li. Before going to 
the next subsection, let us summarize what we have at this stage. We have created 
a set 5 C 2A C [2n] where 

• B has at least ■1'^^* . elements. 

4 log2 t 

• Each element of B has multiplicity at least h = nogTt ^i*^ respect to A. 



t<log^+^n. 



8.3. Proper GAPs revisited. If A and 2A is a subset of a normal GAP Q, it 

is tempting to conclude that A is a subset oi ^Q. A naive "proof would go as 
follows: Assume that there is an element x G A\^Q. Since A C Q, x G Q\\Q and 
so 2x e 2Q\Q. But 2a; e 2A c Q, a contradiction. 
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The trap is in the second sentence. Reasonable it sounds, the statement "x G Q\^Q 
implies 2x E 2Q\Q'' is not true. It is not hard to work out an example where 
2x £ Q (1 2Q. We can, however, easily avoid this subtlety. If we assume that 2Q is 
proper then x G Q\^Q indeed implies that 2x G 2Q\Q. Thus we can conclude 

Fact 8.4. // A and 2A is a subset of a normal GAP Q and 2Q is proper, then A 
is a subset of \Q. 



The above fact motivates the following lemma, which is the main result of this 
subsection. We assume Q is normal and its edges are divisible by Z, so jQ can be 
defined. 

Lemma 8.5. For any constants d and g there are constants 7 and k such that 
the following holds. Let B be a finite set of integers, I a positive integer and Q a 
(normal) proper GAP of rank d satisfying 

• The union of g translations of Q cover IB. 

• kQ is proper. 

Then there is a translation B\ of B such that Bi n jQ has at least ^\B\ elements. 

Proof of Lemma 8.5. We can assume, without loss of generality, that B contains 
0. The normal GAP Q can be represented as Q = {Ylf^i Xiai\0 < Xi < rii}. If IB 
is covered by g translations of Q then IB — IB is covered by gi = g"^ translations 
oi P = Q — Q, which has the form P = {X^^^i Xiai \ — rii < Xi < rij}. Let Pi = \P 
and P2 = ^Pi; it is clear that Pi is a translation of Q. Since gi translations of 
P cover IB — IB and each translation of P is the union of h"^ translations of Pi, 
IB — IB is covered by 2'^gi translations of Pi. Furthermore, as each translation of 
Pi is the union of 2'^ translations of P2, IB — IB is covered by i'^gi translations of 
P2. 

Since 0£B,lB — lB contains B. By the pigeon hole principle, there is a translation 
of P2 containing at least an fraction of B. Equivalently, P2 contains a set 
B' C a + B where \B'\ > 'j\B\ and a is an integer. Setting k = 2'^'^'^gi and 
h = 2^'^^gi + 1, we are going to show that B' - B' is a subset of jPi. Since B' - B' 
contains a subset of constant density of a translation of B and Pi is a translation of 
Q, it follows that there is a translation of B which intersects jQ in elements. 
This implies the claim of the lemma since jQ is the union of 2^ = 0(1) translations 
of iQ. 

In the rest of the proof, let us assume, for the sake of a contradiction, that there 

is an element x oi B' — B' not belonging to jPi. Since B' — P' is a subset of 
P2 — P2 = Pi, a; is an element of Pi. Let Si be the smallest positive integer such 
that six e 2Pi\Pi. Since both 2Pi and Pi are proper, si is at most l/h. 
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Recall that B' is a subset ol a + B. So, an element of B' has the form a + 6 where 
b € B. As X Q B' — B' , X = bi — b2 ior some 61 , 62 <E B. We set y = six and consider 
the sequence y, 2t/, 3y, ... , [//sijy. As si < l/h, [l/si\ > h> 2'^~^^gi. Each clement 
of the above sequence has the form rbi — rb2 for some r < I. Since £ B, these 
elements belong to IB — IB. Let us now restrict ourself to the subsequence 

y,2y,...,{2'^+^g + l + l)y. 

Recall that IB — IB is a. subset of the union of translations of Pi . The pigeon 
hole principle implies that there should be a translation, say a' + Pi, containing two 
elements iy and jy where 2 < i — j < 2'^^^gi. The difference {i — j)y is an element 
of (a' + Pi) - (a' + Pi) = Pi - Pi = 2Pi. Since i-j< 2'^+'^gi = k/2, 2{i - j)Pi 
is proper by the second assumption of the lemma. Moreover, y is an element of 
2Pi\Pi so (i — j)y is an element of 2{i — j)Pi\{i — j)Pi. This is a contradiction 
because (i — j)Pi contains 2Pi as i — j > 2. □ 



8.6. Properties of B. Let us consider the set hB. By the lower bounds on li and 
\B\ (see the last paragraph of subsection 7.1 we have 

The assumptions l'''\A\ > Cn and I < \A\/2 of Theorem 7.1 guarantee that \ A\''+'^ > 
Cn and so 



^' ' - 4t''-i log^+i i 

The factor t'^''^ loga"^^ t is the main source of our troubles. If i is a constant bounded 
by a function of d (say eJ^ ) , then by increasing the value of C we can assume that 
Afi-^ iog'^+^ t sufficiently large and so Theorem 3.12 can by applied. However, t 
can be as large as a positive power of log2 n and in general cannot be bounded by 
any function of d. 

In the remaining part of the proof, we assume that t is very large compared to d 
(for all purposes, it is sufficient to assume, say, t > ). We are going to find 
a way a play this assumption to our advantage (and through our arguments one 
will see the reason for the somewhat artificial definition of mi's). In the remaining 
part of this subsection, we use Lemma 8.5 to derive some properties of B which are 
useful for us. 

Let us start with the usual "doubling" trick. Set Bq = B and define -Bj+i = 2Bi. 
We claim that at some stage we will be in a position to apply Lemma 8.5. 
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It is easy to show (using an argument similar to those used in the proof of Theorem 
3.12) that there is some s such that 2-' < h satisfying \2Bs\ < (2''+2 - l)\B.s\. 
As usual, we let s be the smallest number with this property. By Lemma 4.9, 
Bs is a subset of a constant number of translations of a GAP Pq of rank d + 1 
where Vol(Po) = 0{\Bs\)- Moreover, the proper filling lemma implies that there 
is a constant gi so that giBg contains a proper GAP Pi of rank d + 1 whose 
volume is 0(|i?s I). The differences of Pi are constant multiplies of the corresponding 
differences of Pq, so Pq is covered by a constant number of translations of Pi. 
Therefore, Bs is covered by a constant number of translations of the proper GAP 



In order to apply Lemma 8.5, we also need the assumption that there is a sufficiently 
large constant ki such that fciPi is proper. Unfortunately, nothing guarantees the 
existence of ki. However, if we cannot find ki, then we can use our "rank reduc- 
tion" argument. Set ki be a sufficiently large constant and consider the sequence 

Pi, 2Pi, 4Pi, If for some i < logj ki, 2*Pi fails to be proper, then by the rank 

reduction argument, we can find a proper GAP P2 of rank strictly less than the 
rank of Pi such that the following two properties hold 

• There is a constant g2 such that (72 Pi contains P2. 

• A constant number of translations of Pj cover Pi . 

It follows that a constant number of translations of P2 cover Bg. Now repeat the 
above argument with Pj. As the rank decreases each time, we should be done after 
a constant number of steps. According to our arguments, the final proper GAP 
(for which the assumptions of Lemma 8.5 are satisfied) still has volume f2(|i?s|). 
We call this final GAP P'. 

By applying Lemma 8.5 to P' we obtain a few new properties of B 

• For some m = 0(2*), mB contains a GAP P' which has volume at least 



Moreover, since ii 2*, m <C ^i. 

• There is a subset B' of B such that \B'\ > 'y\B\ and B' is a subset of a 
GAP P which is a translation of ^P'- 

Since we are allowed to ignore constant factors, we assume that B' = B for conve- 
nience. Moreover, without loss of generality, we could assume that P' has symmetric 
form, namely, P' = {aiXi + . . . aa^Xd^ \ — rii < Xi < n,}. 

8.7. A plan. Let us now give a rough discussion of our plan: 

• We are going to find a set T of Z2-tuples in B (a fc-tuple is a set of k not 
necessarily different elements) such that the sum of the elements in any 
tuple is an element of {212)* A, where I2 » Zi is a parameter to be defined. 



Pi. 
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Let S be the c;ollcction of the sums of the tuples in T. We create T in a 
particular manner so that S is sufficiently dense in I2B. 

• We next prove that S + liB contains I2B, relying on the fact that S is dense 
in I2B. This way we obtain the sum set I2B where I2 is significantly larger 
than li. 

• Since iS is a subset of {2l2)*A and liB is a subset of {2li)*A, S + liB is 
a subset of (2/2)* A + {2li)*A. The obvious obstacle here is that the same 
element of A might be used twice, once in {2l2)*A and once in {2li)*A. We 
overcome this problem in subsection 8.10 and show that I2B is in fact an 
element of {2I2 + 2li)*A. 



Wc call this plan a tiling operation as what it does is to tile many copies of Zi-B 
together to get a bigger set I2B. 

Would we be done after a successful implementation of this plan ? Well, we would 

be in a very good position if we can guarantee that ^2!^! (this inequality is 

necessary for an application of Theorem 3.12 to I2B). In the case d = 1, we can 
do this and the above plan was carried out successfully in an earlier paper [27]. 
Unfortunately, there is a serious difference between the two cases case d = 1 and 
d> 2. For d = 1, the troublesome factor t'^~^ log2^^ t is only logj t and there is a 
way to set up I2 so this poly-logarithmic factor can be ignored. On the other hand, 
in the general case d > 2, the troublesome factor is a polynomial in t (which is of a 
different order of magnitude) and even the optimal value we could get for I2 would 
not be enough to kill this factor. 

We are going to resolve this problem by repeating the second step of the plan many 
times. Roughly speaking, what we shall do is to put many original tiles (copies of 
the set l\B) together to get a larger tile I2B. Next, we put many copies of I2B 
together to get an even larger tile I3B and so on. We repeat the operation until we 
get a sufficiently large tile IkB which satisfies l'^\B\ n. 

There is a trade-off in this argument. The repetitions make the problem mentioned 

the last step of the above plan more severe: Now the same element of A might be 
used as many as k times. Luckily, our treatment for this problem is not sensitive 
to this modification as far as k remains a constant, which is the case. 

Finally, let us go back to address the first step: How can wc find I2 elements of B 
such that their sum can be represented as the sum of 2^2 different elements of A 
? The main idea is as follows: An element of B has multiplicity li with respect to 
A, so it gives us li pairs of elements of A, all have the same sum. Therefore, a set 
of m different elements of B gives us hm different pairs. On the other hand, each 
element in A occurs in at most |^| — 1 < |^| pairs. Using the greedy algorithm, 
we can find at least mutually disjoint pairs. Thus, for any I2 < ^j^, we have 
a collection of I2 mutually disjoint pairs. Clearly, the sum of the I2 elements of B 
corresponding to these pairs is an elements of (2/2)*^. 
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The critical feature of this step is how to choose the set of m elements of B. We 
discuss this issue in the next paragraph. 



8.8. The Tiling Operation: Start. Let us start with the execution of the first 
step. Recall, from the last paragraph of subsection 8.6, that _B is a subset of a 
proper GAP P of constant rank di (the value of di is irrelevant but wc do know 
that di < d + 1). It is easier for the reader to visualize the argument if he/she 
identifies P with a di dimensional box. Partition each edge of P into Ti intervals 
of equal length, where Ti is a parameter to be determined. The products of these 
intervals partition P into (Ti)'^^ identical small boxes. A small box Q is dense if the 

I B I 

number of elements of B in Q is at least 2{Ti)''i ' *5 is sparse otherwise. The sparse 
boxes contain at most half of the elements of _B, so at least half of the elements of 
B should be contained in dense boxes. Since constants like 1/2 do not play any 
significant role, we assume, for the sake of convenience, that all elements of B are 
contained in dense boxes. 

Let us recall that |-B| > ^ j^j* ^ and li = ji^j-j • By throwing away dummy elements, 
we can assume that |-B| is exactly jj^^- 

Consider a dense box Q, for each element x G B (1 Q, x has multiplicity h with 

respect to A. Wc set the number m in the last paragraph of the previous subsection 
to be \B\/2T^^; as Q is dense we are guaranteed to find this many elements of B 
in Q. The argument in the above mentioned paragraph shows that we can have at 
least 



disjoint pairs. For a technical reason, we do not set I2 equal this value, but equal 
one-third of it: 



h\B\ 

^ i2(Ti)'^im' 

For X G B let be the collection of pairs (in A) summing up to x. We have 

proved 

Fact 8.9. For each dense box Q, the union of N^'s (x G B 11 Q) contains at least 
3/2 mutually disjoint pairs. 

Substituting the values of h and \B\ into the formula of I2, we have 
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For each dense box Q, fix a collection Nq of 8/2 disjoint pairs. For a pair (a, b) in 
Nq, the number a + 6 is a point of the box Q (a + 6 G B fl Q). In the following, 
we denote by Dq the collection of these points; Dq is a multi-set as different pairs 
may have the same sum. Let D be the union of the Dq^s. 

Let us now take a closer look at the set I2B. An element x of this set can be 
written as x = xi + ■ ■ ■ + xi^, where Xi's are not necessarily different elements of 
B. Moreover, we assumed that every element of B is in some dense box, so each 
Xi is in some dense box Q (different Xj's may, of course, belong to difl[erent boxes). 
Fix a dense box Q; for each Xi G Q, we are going to replace it by some yi G Dq. 
Now comes a very important point. Since \Dq\ > 3I2 for any dense box Q, we can 
replace xi, . . . jXi^ with elements yi, . . . , with the following property: There are 
mutually disjoint pairs (ai, a[), . . . , {ai^, aj^)' o^ii^'i ^ such that ai + a[ ~ yi. To 
see this, let us consider the following rule. For xi, choose an arbitrary pair (ai, a'l) 
from Dq-^ where Qi is the dense box containing xi; set yi = ai+ a[. Assume that 
(ai, a'j), . . . , {ai-l,a^_-^) have been chosen. Consider Xi and the set Dq. where Qi 
is the dense box containing Xi. Delete from Dq^ every pair which has a non-empty 
intersection with the chosen pairs. Since the pairs in Dq. are disjoint, any pair 
(aj,a'j) (1 < j < « — 1) could intersect at most 2 pairs in Dq. so we delete at most 



2(i-l)<2(/2-l)<2Z2, 

pairs from Dq. . But Dq , contains 3^2 pairs so there are always some pairs left and 
we choose an arbitrary one among these. 



The disjointness of the chosen pairs guarantees that y = yi + • ■ ■ + yi^ can be 
represented as a sum of exactly 2^2 different elements from A. Let T denote the 
collection of the tuples [yx, . . . ^yi^) and S be the collections of their sums. Following 
the plan, we next show that S + liB contains I2B. 

Consider x = xi-\ Vxi^. Since Xi G B and B c P, each a;,'s is an element of the 

box P and can be viewed as a point in Z'^'^ , so we can view x as a vector in Z^^i . 
By replacing Xi with yi, we obtain another vector y = Y^^jL^ yi- We are going to 
find a box Pi centered at the origin so that Pi is a subset of hB and the difference 
X — y = X^iLi i^i ^ yi) a vector in Pi . The union of the copies of such a Pi centered 
at the points of S cover I2B. As Pi C liB, it follows that I2B C S + liB, as desired. 

The key observation in what follows is that Xi — yi is small because they are in 

the same small box (this is the main reason why we partition P into many small 
boxes). Let us fix an edge of P and assume that its length is si. The absolute value 
of the component of Xi — yi in the direction of this edge is at most s\/T\. It follows 
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that the corresponding component oi x — y is at most hsi/Ti. Wc arc going to 
choose Ti and define Pi so that that this bound is at most half the length of the 
corresponding edge of Pi (Pi is centered at the origin). This would imply that Pi 
contains the vector x — y. 

Now wo arc going to define Pi. The last paragraph of subsection 8.6 tells us that 
mB contains a GAP P' = mP, for some m -C h- Thus liB contains the box 
^P' = hP. This is our box Pi. Observe that Pi's edge in the relevant direction 
has length si^i. In order to guarantee that this length is at least twice hsi/Ti, we 
should set Ti so that 



2Z2S1 sil^l 

Slh > ^ = T-Ti (51) 

Ti 2AT^'+^ log^ t ^ ' 



To satisfy (51), it is sufficient to set 



V24Zilog^i/ V 



241og2i 



since h = \A\/t\ogi t. For the sake of a cleaner calculation, we set T\ a little bit 
larger 



Substituting the above value of Ti into the definition of I2 in (50), we obtain 



'^1 - '"I >-^^^.- w 



48(Ti)'^i log2 1 48i<^i/(''i+i) log2 1 ~ f^^/idi+i) log^ t 

This I2 is still not large enough, namely, /flPj could still be smaller than n. Indeed, 
the above lower bound on I2 only guarantees that 



21 I - ^dcii/(di + l)log2''^ Slogai f^di/(di + l)-l\ogl'i+^t ^ ' 

where the right hand side can be significantly smaller than n if \A\ = 0(n^/(''+^^) 
and dd\/{di + 1) — 1 > 0. Our plan is to increase the value of I2 by repeated tiling. 



To conclude this subsection, let us discuss the problem that the same element of A 
might appear twice in a representation of an element of hB. Observe that hB is 
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subset of {2li + 2I2)* A and thus any clement of I2B is a sum of 2li + 2I2 elements of 
A. However, as we already pointed out, an element of A can appear twice, once in 
{2li)*A and once in {2li)*A. This problem can be resolved by the so-called cloning 
trick, introduced in [28]. 



8.10. The cloning argument. At the very beginning of the whole proof, we split 

the set A into two sets A' and A in such a way that « |A | and any number 
X which has high multiplicity with respect to A' should have almost the same 
multiplicity with respect to A . Next, we continue with A' and keep A for reserve. 
Repeat the whole proof with A' playing the role of A until the previous paragraph. 
We call the set of elements with high multiplicity (with respect to A') B' instead 
of B. Now doing the same with A we obtain a set B . 

The key point now is that with a proper splitting, the two sets B' and B are 
exactly the same. So when we look at I2B' as a subset of 5 + hB', we can think of 
an element of 5 as a sum of I2 elements from B , rather than from B' . Therefore, 
when we replace each clement from B' and B by the sum of two elements from A, 
the elements used for iS come from A and the elements used for liB' come from 
A' and this guarantees that no element of A is used twice. 

A random splitting provides the sets A' and A as required. For each element of A 
throw a fair coin. If head, we put it into A' , otherwise it goes to A . If a number 
X has multiplicity ^ logn with respect to A, then standard large deviation 
inequalities (such as Chcrnoff's) tell us that with probability at least 1 — n"^, x 
has multiplicities ^ ± 10^An^^ogn = (1 + o(l))^ with respect to both A' and 
A . Since there are only at most 2n numbers x to consider, with probability close 
to 1, every x with multiplicity l:^ logn has approximately the same multiplicities 
in A' and A" . 

When we create the set Si (which we later rename to B) in subsection 8.1, any 
element x in Si has multiplicity m{x) at least 2^+^1+1) logn with respect to 
A. So X will have multiphcity roughly m(x)/4 with respect to both A' and A" . 
Thus one can expect that x will appear in both B' and B . The only case we may 
have to worry about is when m{x) is very close to a threshold (say nii) and then 
(because the error terms can go either way) x might be in B' but not in B (or 
vice versa). This problem is easy to deal with, we just force this x to be in both 
B' and B (of course, forcing x might decrease h slightly (by a factor .9, say) but 
this does not influence anything). 



8.11. The tiling operation: Finish. We repeat the tiling operation in subsection 
8.8 with new parameters. Now P is cut into T^^^ boxes, where T2 is a parameter to 
be chosen. Instead of (50), we define 
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Here is our key point: in order to obtain l^B, we now add <S with I2B, instead of 
with liB as in subsection 8.8. This means that instead of Pi we can use the larger 
box P2 = 7^ -Pi. As an analogue of (51), the condition we need on T2 is 



sih > -jr- (55) 

Notice that in the left hand side of (55) we have I2 instead of li. The fact that I2 ^ h 
allows us to set T2 much smaller than Ti. Consequently, Is becomes significantly 
larger than ?2- Repeating this results in a sequence Zi < Z2 < ^3 < ^4 < • • • j where 
for some constant k, Ik will be sufficiently large. 

Now let us present some computation. The derivation of T2 from (55) is similar to 
that of Ti from (51). It is sufficient to set 



/ \A\ 

V24Mog|t>' 



in order to satisfy (55). Since I2 > jjud^rh^t' 



/ \A\ i/(di+i) 
V24/2log^ J " V 24 ) 



SO we can set T2 = ( - — 24 — j ■ Again, for convenience, we set T2 a bit 

larger 



T2 = t''i/(<*i+i)' 

which implies 



/ = 1-41 . 1^1 

' 48(r2)'^i iog2 1 - t^l/('i^+^r iog2 1 ■ ^ ' 



By induction, we can show 
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^'^^ i4/(<iiii)'=iog^t- ^^^^ 

By choosing k sufRciently large (say, k = 2{di + 1) log(d + 1)), we have (using the 
fact that t is much larger than d) 



Ik is now sufficiently large, namely, it satisfies the critical inequality ^ n (one 
can easily check this by substituting \B\ = j^p)- This inequality provides the 
necessary condition we need to apply Theorem 3.12 to the set IkB. 

Our proof shows that hB is a subset of {2lk)* A+ (2lk-i)* A-\ \-{2li)*A. In this 

sum an element of A might be used k times. This problem can be handled using the 
cloning argument exactly as before, with the only formal modification that instead 
of splitting A into two subsets, we split it into k subsets. 

To be completely done, there is one last issue we need to discuss and that is the 
magnitude of the sum li -\ + lk- 

As we have shown (with the aid of cloning), the set IkB is a subset of 



(2Zi + • • • + 2lk)*A = I* A, 

where I = 2li-\ \-2lk. We need to compare I with I and naturally there are two 

cases. If Z < Z, then we set A' = B, I' = I and n' = 2n. In this case, we have 



> 


ii\B\ 




> 


, 1^1 


\B\ 


> 


, \A\ 


\A\t 
X — — ■ — 
41og2< 


> 


\Af+H^ 


-{d' + l)/2d 


> 


\Af+^ 




> 


f'\A\, 





for every 1 < cZ' < d. This guarantees that the triple {A' , l',n') is perfect. 
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In the remaining case when I > I, there is an index i < k such that 
2h + ...2li<l< 2h + --- + 2li+i. 

We now modify the tiling operation a httle bit. First of all, it is clear that we do 

not have to proceed beyond the ith tiling so we make this tiling our last. Moreover, 
in this last tiling we shall not use the whole set kB as a tile, but only a fraction 
of it, say I'^B for some l'^ < k (as we mentioned many times, our arguments are 
invariant with respect translations so we can assume that Z'B is a subset of liB). 
As the result, we obtain a set l'i_^_iB instead of k+iB, for some /-.j.^ < l^+i. The set 

I'i^iB is a subset of {2li H \-2li + 2l[_^i)*A where, with a proper choose of l[, we 

can guarantee that 



l/2<{2h + --- + 2li + 2l'._^^)<l. 

Now we can set A' = B, I' = n' = 2n and conclude the proof as discussed in 
subsection 7.2. □ 



8.12. A common generalization of Theorems 5.1 and 7.1. In this subsection, 
we present a common generalization of Theorems 5.1 and 7.1. Let us first remind the 
reader of the sumsets studied in these two theorems. In Theorem 5.1, we consider a 
sum of different sets Ai, . . . ,Ai, but allow the same number to appear many times in 
a representation (the same number may occur in several Ai 's) . On the other hand, 
in Theorem 7.1 we have only one set A in the sum, but with the restriction that 
the summands of a representation must be different. For a common generalization 
of these theorems, we consider a sum which involves different elements of different 

sets. Let Ai,. . . ,Ai be sets of integers, we define Ai + A2 + ■ ■ ■ + Ai as the 
collection of all numbers which can be represented as a sum of I different numbers 
ai € Ai, . . . ,ai € Ai. Formally speaking 



Ai + A2 -\ + Ai = {ai-{ \- ai\ai e Ai,ai ^ aj for 1 < i < j < /}. 

* 

We refer to Ai + A2 as the star sum of Ai and A2. 

Theorem 8.13. For any fixed positive integer d there are positive constants C and 

c depending on d such that the following holds. Let Ai, . . . ,Ai be subsets of size \ A\ 

* * * 

of [n] where I and \A\ satisfy I \A\ > Cn. Then Ai + A2 + ■ ■ ■ + Ai contains a 
GAP of rank d' and volume at least cl'^ \A\, for some integer 1 < d' < d. 

About the proof, one's first impression would be that one can prove Theorem 8.13 

using Theorem 7.1 the same way one proved Theorem 5.1 using Theorem 3.12. 
This, however, is not possible due to a subtle problem involving star sums. While 
it is clear that the (set) equality 



(Ai + A2) + {A3 + A4) = Ai+A2 + A3 + A4 
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is true, its star sum counterpart 

{Ai + A2) + {A3 + Ai) = Ai+ A2+A3+ Ai 

is false. 

So far, the only way (we know) to verify Theorem 8.13 is to repeat the proof of 
Theorem 7.1 with appropriate modifications. This is a tedious task, but no essential 
new arguments arc required, and wc thus omit the details. Let us, however, present 
the variant of a step in the proof of Theorem 7.1, Lemma 7.9, in order to give the 
reader an idea about the kind of modifications one needs to carry out. 

Lemma 8.14. Let Ai, 1 < i < 201og2|^|, be finite sets of real numbers with 

the same cardinality \A\, where \A\ is sufficiently large. Then there is an integer 
1 < T < 101og2 \A\ and elements ai € Ai,a2 € A2, . . . ,a2T G A2T such that all 
ai 's are different and the set B = {ai, . . . , a2T} satisfies 

\T*B\ > \A\. 

Proof of Lemma 8.14. Wc assume that \A\ is sufficiently large so that \A\ > 
lOOlogj \A\. We choose ai and a2 from Ai and A2, respectively, with the only 
condition that ai ^02. Once ai,. . . ,a2i have been chosen, we next choose a2i+i 
and a2i+2 from ^2i+i\{ai, • • • , a2i} and A2j+2\{ai, . . . , a2i} so that a2j+i ^ a2i+2 
and 



\{i + l)*{ai, . . . ,a2j+i,a2i+2}| > l.l|i*{ai, . . . ,a2i}| (58) 

(if there are many possible pairs, wc choose an arbitrary one). Wc stop at time 
T when \T* {ai, . . . , a2T}\ > |^| and set B = {ai, . . . ,a2T}- It is clear that 
\B\ < 21og^ \A\ < 201og2 \A\. The only point we need to make is to show that 
as long as |i*{ai, . . . , a2i}\ < \A\, we can always find a pair {a2i+i , a2i+2) to satisfy 
(58). Assume (for a contradiction) that we get stuck at the i*'* step and denote 
by S the sum set i*{ai, . . . , a2i}- For any two numbers a G ^2i+i\{ai, • • • , CL2i}, a' & 
^2i+2\{ai) • • • ) «2i} the union {a+S)L}{a'+S) is a subset of (i+l)*{oi, . . . , a2i, a, o'}. 
So by the assumption we have 

\{a + S)U{a' + S)\ < 1A\S\. 

Since both a + S and a' + S have \S\ elements, it follows that their intersection 
has at least .9|S| elements. This implies that the equation a' — a = x — y has at 
least .9|S'| solutions {x,y) where x E S and y G S. Now let us fix a as the smallest 
element of A2i+i\{ai, . . . , a2i} and choose a' arbitrarily from A2i+2\{ai, . . . , a2i, a} 
(we exclude a' from ^2^+2 so we are guaranteed that a ^ a'). There are at least 
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|A| — 2i — 1 > choices for a', each of which generates at least .9|5| pairs {x, y) 
where both x and y are elements of S. As all (a;, y) pairs are different, we have that 



.9|A|x.9|5|< 

which implies that \S\ > \A\, a contradiction. This concludes the proof. □ 



9. ERDOS' conjecture on COMPLETE SEQUENCES 

In 1962, Erdos introduced the following notion which has later become quite pop- 
ular: An infinite set A of positive integers is complete if every sufficiently large 
positive integer can be represented as a sum of different elements of A (see Sec- 
tion 6 of [9] or Section 4.3 of [23] for surveys about completeness). For instance, 
Vinogradov's result (mentioned in the Overview) implies that the set of primes is 
complete. On the other hand, there is a big difference between the study of com- 
plete sequences and the study of classical problems of Vinogradov- Waring type. For 
completeness, we do not require the number of summands in a representation to be 
the same. This relaxation leads to a quite different kind of results. For problems 
of Vinogradov- Waring type (where the number of summands is fixed), one usually 
requires a very precise description of the sequence (the set of primes or the set of 
squares, say). For problems concerning complete sequences, it has turned out there 
is much more flexibility. 

What would be the first condition for a sequence to be complete ? Well, density 
must be the answer, as one cannot hope to represent every positive integer with a 
very sparse sequence. But one would also notice instantly that density itself would 
not be enough: The set of even numbers has very high density, but is clearly not 
complete. This shows that one should also consider a condition involving modular- 
ity 

In number theory it happens quite frequently that the obvious necessary conditions 
are also sufficient. In 1962, Erdos made the following conjecture 

Conjecture 9.1. There is a constant c such that the following holds. Any increas- 
ing sequence A = {ai < 02 < as < . . . } satisfying 

(a) A{n) > cn^l"^ 

(b) Sa contains an element of every infinite arithmetic progression, 



is complete. 
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Here and later A(n) denotes the number of elements of A not exceeding n. The 
bound on A{n) is best possible, up to the constant factor c, as shown by Cassels 
[4]. 

Erdos [8] proved that the statement of the conjecture holds if one replaces (a) by 
a stronger condition that A{n) > crS^~^^ ^"^ . An important step was later made 
by Folkman [14], who improved Erdos' result by showing that A{n) > cn^l"^^^ is 
sufficient, for any positive constant e. The first and simpler part in Folkman's proof 
is to remove the condition (b). He showed that any sequence satisfying (b) could be 
partitioned into two subsequences with the same density, one of which still satisfies 
(b). In the next and critical step, Folkman shows that if A is a sequence with 
density at least n^/^+^ then Sa contains an infinite arithmetic progression (in other 
words, A is subcomplcte). His result follows immediately from these two steps. 
Folkman's proof, naturally, led him to the following conjecture, which is perhaps 
even more to the point than Conjecture 9.1 

Conjecture 9.2. There is a constant c such that the following holds. Any increas- 
ing sequence A = {ai < 02 < as < . . . } satisfying A{n) > cn^^"^ is subcomplcte. 

Folkman's result has further been strengthened recently by Hegyvari [18] and 
Luczak and Schoen [21], who (independently) reduced the density n^/^+' to cn^/^ log^^^ n, 
using the result of Sarkozy (see Section 3). 

In a previous paper [28], we proved Conjecture 9.2. However, we decide to discuss 
this problem here for pedagogical reasons. It would be more useful for the reader to 
consider this problem together with Conjecture 6.1 and under the general sufficient 
condition proved in Sec;tion 6. As a matter of fact, given this sufficient condition, 
it is now very simple to prove Conjecture 9.2. The only modification one needs to 
make is to replace Lemma 6.10 by the following 

Lemma 9.3. There is a constant C such that the following holds. If A is a set of 
different positive integers between 1 and n and \A\ > C\fn, then Sa contains an 
arithmetic progression of length n. 

The rest of the proof is the same. 

Theorem 9.4. There is a constant c such that the following holds. Any increasing 
sequence A = {ai < 02 < < . . .} satisfying A{n) > cn^^'^ is subcomplcte. 

Let us conclude with a comment on Conjecture 9.2 and Conjecture 6.1. These 

conjectures look quite similar, which comes as no surprise as they appeared in the 
same paper. The interesting point here is that the proof of Conjecture 6.1 requires 
only Theorem 5.1, which is an easy application of Theorem 3.12, but the proof of 
Conjecture 9.2 requires the much harder Theorem 7.1. On the other hand, prior 
to our study. Conjecture 6.1 seemed harder to attack and less partial results were 
known. 
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Remark. We have recently been informed by Lev (private communication) that 
Chen [6] also proved Theorem 9.4, using a different method. 



10. Arithmetic progressions in finite fields 

In this section we assume that n is a prime. We are going to extends our previous 
theorems to arithmetic progressions modulo n. The quantitative statements in 
these theorems will change slightly, but the proofs remain essentially the same. We 
first establish the results and then describe an application. 



10.1. Results. In order to show why we need a modification in the statements 
of the theorems, let us consider the proof of Theorem 3.12. At one point in the 
proof (see the paragraph following (17)), we used the fact that I A is a subset of 
the interval [In] and thus has cardinality at most In. In the finite field case, lA is 
always a subset of the set of residues modulo n and so its cardinality is always at 
most n, no matter how large I is. This suggests that we should gain an extra factor 
I in the assumption of the theorem and that has turned out to be indeed the case. 
The analogue of Theorem 3.12 is as follows 

Theorem 10.2. For any fixed positive integer d there are positive constants C 
and c depending on d such that the following holds. Let n be a prime and I be a 
positive integer and A be a set of residues modulo n such that l'^~^^\A\ > Cn. Then 
the sumset I A (modulo n) contains an arithmetic progression (modulo n) of length 
min{n,d|A|i/''}. 

There are two modifications in Theorem 10.2 (compared with Theorem 3.12). First 
we changed Z** to Z**"*"^, which is consistent with the above discussion. Second, we 
changed the lower bound from to min{n, d| This modification is 

natural and justified, as lA can have at most n elements. We shall comment on 
this at the end of the next paragraph. 

The proof Theorem 10.2 is the same as the proof of Theorem 3.12, the only place 
one needs a (formal) modification is (17). In this inequality, the rightmost formula 
should be Cn instead of Cln, which is consistent with the discussion in the para- 
graph preceding Theorem 10.2. Freiman's theorem and all lemmas used for the 
proof of Theorem 3.12 hold for residue classes (see [27] for exact statements). To 
explain the change in the lower bound, notice that in the proof of Theorem 3.12 
we actually showed that either lA = [In] or lA contains an arithmetic progression 
of length d|A|^/''. Its finite field analogue says that either I A contains all residues 
modulo n or it contains an arithmetic progression of length c?|A|^/'*. In Theorem 
3.12, it is unnecessary to state the lower boimd as min{Zn, because In is 

always larger than dlAj^/''. On the other hand, in the finite field case, it makes 
sense to write min{n, cZ|j4|^/'^} since n can be smaller than dlAI^/"^}. 
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Theorem 10.2 demonstrates the flexibility of our method. It is not clear, for in- 
stance, how to prove a finite field version of Theorem 3.3 (which is a special case 
of Theorem 3.12) using the original approaches of Freiman and Sarkozi. 

Similar to Theorem 3.12, Theorem 10.2 is sharp. One can modify the general 
construction in Section 3 to match the lower bound. This construction also mirrors 
the extra term I. 

A construction modulo n. We present a modification of the principal construction 
in Section 3. Now set a = L ^"^^^i/'"^' j (notice the extra I in the nominator) and 

^ ~ L(rfipjT7rf)^^'''* "^^J ■ Notice that imdcr the assumption of Theorem 10.2, (1) 
stills hold with the new definition of a. We again have two cases: 

(1) J2i=i Ti — O(mod n). By the definition of the a^'s, it follows that X^^^i Tihi = 
O(mod n) and d should be at least 3. By the definition of the 6,'s, it follows 
immediately that 



max Inl > min( min — ) > -ai/(''-i) > 2l\A\^/'^, 



where the last inequality is from (1). 

(II) r, ^ O(mod n). In this case, we have 

d d 

^rja + ^rjhj = pn 
for some integer p. If p = 0, then 



max IrJ > > -a^'^'^'^^ > 2l\A\^/'^. (60) 



If p ^ 0, then 



max \n\ > > jT-^ > ia^^^"-^^ > 2l\A\^/<'. (61) 



LONG ARITHMETIC PROGRESSIONS IN SUMSETS: THRESHOLDS AND BOUNDS 65 



Without any further explanation, we now state the analogues of Theorems 5.1, 7.1 
and 8.13. 

Theorem 10.3. For any fixed positive integer d there are positive constants C and 

c depending on d such that the following holds. Let Ai, . . . , Ai be sets of residue 

classes modulo n of size \ A\ where I and \ A\ satisfy l'^~^^\A\ > Cn. Then AiH i-Ai 

either contains all residue classes modulo n or contains a proper GAP of rank d' 
and volume at least cl'^ \A\, for some integer 1 < d' < d. 

Theorem 10.4. For any fixed positive integer d there are positive constants C and 
c depending on d such that the following holds. Let n be a prime and I he a positive 
integer and A be a set of residues modulo n such that l''-^^\A\ > Cn. Then I A 
either contains all residue classes modulo n or contains a proper GAP of rank d' 
and volume at least cl'^ \A\, for some integer 1 < d' < d. 

Theorem 10.5. For any fixed positive integer d there are positive constants C and 
c depending on d such that the following holds. Let n be a prime and I be a positive 
integer and Ai, . . . ,Ai be sets of residues modulo n such that \ Ai \ = ■ ■ ■ = \ Ai \ = \A\ 

* * 

and r~^^\A\ > Cn. Then Ai + ■ ■■ + Ai either contains all residue classes modulo 
n or contains a proper GAP of rank d' and volume at least cl'^ \ A\, for some integer 
Kd' <d. 



10.6. An application. A set A of residues modulo n is called zero-sum-free if 
none of the subset of A adds up to zero modulo n. Zero-sum-free sets are objects 
of considerable interest in additive number theory (see Section C of [17] and the 
references therein). Here we address the following basic question: 

How many zero-sum-free sets are there ? 

We denote by Sa the collection of partial sums of A, so A is zero-sum-free if and 
only if ^ ^. Szemeredi [26] and Olson [22], answering a question of Erdos, proved 
that a zero-sum-free set has at most 2n^/^ elements. This implies that the number 
of zero-sum-free sets is at most 



i=l 



It is not hard to give a lower bound of 2^^^); notice that every subset of the 
interval [[\/2n— IJ] is zero sum free, since 



The number of subsets of the above interval is clearly 2^^^). 
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In an earlier paper [27], we succeeded to establish a sharp bound, using a weaker 
version of Theorem 10.5. (To be more precise, what we actually used was a weaker 
version of the finite field analogue of Theorem 3.8.) 

Theorem 10.7. Let n be a prime. The number of zero-sum-free sets (mod n) is 

This surprising estimate might deserve an explanation. To reveals its origin, let us 
give a short proof for the lower bound. We call a set A of positive integers n-small 
if the sum of the elements in A is less than n. It is trivial that an n-small set is 
zero-sum- free. On the other hand, the number of n small sets is 2^^^ '°82 e+o(i)) 
due to the following lemma, which is a well-known result in the theory of partitions 
(see, for instance, Theorem 6.7 in [1]). 

Lemma 10.8. The number of representations of n as a sum of different positive 
integers is 2(A/i''^^°S2 e+o(l))^/n_ Consequently, the number of n-small sets is 

The hard part of Theorem 10.7 is the upper bound. Using our results on long 
arithmetic progressions (modulo n) we managed to show that if A is zero-sum-free 
and has relatively many elements (the number of sets with at most n^^'^f log2 n 
elements is 2°^"^^^^ so we can ignore these sets), then A is close to be n-small (for 
the exact statement please see [27]). The general idea is as follows. Let A' be a 
relatively small subset of A; our results show that Sa' contains a quite long arith- 
metic progression. We next make many translations of this arithmetic progression 
by adding to it elements from A\A' . If all these translations avoid 0, then we have 
a good chance to deduce a structural property of A and it turned out that typically 
A should look like a n-small set. A similar argument can be applied to determine 
the number of x-sum-free sets, for any non-zero residue class x. Trying not to spoil 
the fun, we do not state the theorem here (it can be found in [27]), but let us men- 
tion that the bound for non-zero x is different from the bound in Theorem 10.7. 
Guessing this bound is a nice puzzle the reader who bears with us until this very 
end might enjoy. 
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